Getting Started

System Requirements

Operating System	Windows 10/11 (64-bit)
RAM	4 GB minimum, 8 GB recommended (16 GB for local AI)
Disk Space	500 MB for the app, plus space for AI models if running locally
GPU (optional)	NVIDIA GPU with CUDA support for faster local AI inference
Network	Internet required for cloud AI providers. Not required for local AI.

Installation

AumaTron is available as a Windows installer or a portable package.

Windows Installer (Recommended)

Download the installer from aumatron.com/download
Run the installer — it bundles Node.js, so no pre-installation needed
AumaTron installs to your AppData folder and creates a desktop shortcut
Launch AumaTron from the desktop shortcut or Start menu

Portable Package

Download the portable ZIP from aumatron.com/download
Extract to any folder on your machine
Ensure Node.js 18+ is installed on your system
Open a terminal in the extracted folder and run npm install
Start with npm start or node server/index.js

First Run Setup

On first launch, AumaTron automatically:

Generates a .env file with secure random keys (ENCRYPTION_KEY, SESSION_SECRET, JWT_SECRET)
Creates the SQLite database and runs all migrations
Starts the server on port 3000

Open your browser and navigate to http://localhost:3000. You'll be prompted to create your first account using an invite code.

Invite Codes: AumaTron uses invite-based registration. The first user is created during setup. Additional users need an invite code generated from Settings → Manage Invites.

Setting Your Workspace

The workspace is the root folder where AumaTron reads and writes files. All file operations (AI tools, file browser, uploads) are restricted to this folder for security.

Go to Settings → General
Under Workspace Path, enter the folder path (e.g. C:\Users\YourName\AumaTron-Files)
Click Save

Tip: If you want AumaTron to work with downloaded files, set your workspace to include your Downloads folder, or set it to a parent folder that covers both your working files and downloads.

Quick Start

Once installed, here's the fastest way to see AumaTron in action:

Set an AI provider — Go to Settings → AI Providers. For the quickest start, paste your OpenAI API key. For free/private, set up a local model (see Local AI).
Send a message — Type something in the chat like: "Go to wikipedia.org and tell me today's featured article"
Watch it work — AumaTron will launch a browser, navigate to the site, extract the content, and respond in chat.

That's it. You're automating with AI.

AI Providers

Local AI (Llama)

Run AI completely offline on your own hardware. No API costs, no data leaving your machine.

Setup

Download a GGUF model file (e.g. Llama 3, Mistral, Phi-3) from Hugging Face or similar
Go to Settings → AI Providers → Local AI
Set the path to your GGUF model file
Configure GPU layers (higher = faster, but uses more VRAM)
Click Save — AumaTron will start the llama.cpp server automatically

Recommended Models

Model	Size	RAM Needed	Best For
Phi-3 Mini (Q4)	~2 GB	4 GB	Light tasks, low-resource machines
Llama 3 8B (Q4)	~4.5 GB	8 GB	General use, good balance
Mistral 7B (Q5)	~5 GB	8 GB	Reliable all-rounder
Llama 3 70B (Q4)	~40 GB	48 GB+	Maximum quality (needs powerful hardware)

Available on all plans. Local AI is the default provider on the Free plan. No API key needed.

OpenAI, Claude, Deepseek

Cloud providers offer more powerful models without needing local GPU resources. Available on all plans — just add your API key. Custom providers (Groq, Mistral, etc.) require a Pro plan or higher.

OpenAI

Get an API key from platform.openai.com
Go to Settings → AI Providers → OpenAI
Paste your API key and select your preferred model (GPT-4o, GPT-4o-mini, etc.)

Anthropic Claude

Get an API key from console.anthropic.com
Go to Settings → AI Providers → Claude
Paste your API key and select a model (Claude Sonnet, Haiku, etc.)

Deepseek

Get an API key from platform.deepseek.com
Go to Settings → AI Providers → Deepseek
Paste your API key

Your keys are encrypted. All API keys are stored with AES-256-GCM encryption in the local database. They never leave your machine.

Custom Providers

Add any OpenAI-compatible API endpoint as a custom provider. This works with Groq, Mistral, Together AI, Ollama, LM Studio, and many others. Pro+

Go to Settings → AI Providers → Custom Providers
Click Add Provider
Enter a name, the API endpoint URL, your API key, and the model name
Test the connection, then save

Any provider that follows the OpenAI chat completions API format (/v1/chat/completions) will work.

Choosing the Right Provider

Scenario	Recommended	Why
Maximum privacy	Local AI (Llama)	Nothing leaves your machine
Best quality	Claude or GPT-4o	Most capable at complex tasks
Lowest cost	Deepseek or Local	Deepseek is very affordable; Local is free
Fastest response	Groq (custom)	Extremely fast inference
Simple tasks	GPT-4o-mini or Phi-3	Cheap/free, good enough for straightforward work

Browser Automation

How It Works

AumaTron controls real browsers using Playwright and Puppeteer. When you ask the AI to interact with a website, it:

Launches a browser window (or uses an existing persistent session)
Navigates to the URL you specify
Reads the page content and determines the best action
Clicks buttons, fills forms, scrolls, extracts data — whatever your instruction requires
Reports back the results in chat

No scripting or coding required. Just describe what you want in plain language.

Example prompts

"Go to my Shopify admin and check how many orders came in today"
"Log in to WordPress and publish the draft post called 'Summer Sale'"
"Search Google for 'best project management tools' and give me the top 5 results"

Browser Types

Browser	Plan	Best For
Firefox (Playwright)	Free+	General browsing, good compatibility
Chrome Incognito (Puppeteer)	Pro+	Sites that require Chrome, stealth mode
Tor	Pro+	Anonymous browsing, accessing .onion sites
Tor Standalone	Pro+	Tor without needing a system Tor install
Headless Mode	Ultra	No visible window, faster, uses less resources

Select your default browser in Settings → General → Browser Settings. Pro+ plans can switch between browsers at any time.

Saved Sites & Credentials

Save frequently visited websites with their login credentials so the AI can log in automatically.

Go to Settings → Saved Sites
Click Add Site
Enter the site URL, username, and password
Optionally add custom instructions (e.g. "Always click 'Accept Cookies' first")
Save

Now when you mention the site in chat or Telegram, AumaTron automatically uses the saved credentials to log in. All passwords are encrypted with AES-256-GCM.

WordPress Mode

For WordPress sites, enable WordPress mode on the saved site. This provides dedicated support for WP admin operations like publishing posts, managing plugins, and checking updates.

Stealth & Anti-Detection

Chrome Incognito mode includes anti-detection stealth features:

Fingerprint spoofing (WebGL, Canvas, AudioContext)
Realistic browser headers and navigator properties
Human-like mouse movement and typing delays
Persistent session cookies across runs

This helps avoid bot detection on sites that actively check for automated browsers.

Tips for Reliable Automation

Be specific: "Click the blue 'Submit Order' button" works better than "submit the form"
Use saved sites: Pre-saved credentials eliminate login failures
Add pre-actions: Configure sites to dismiss popups or accept cookies before the main task
Use persistent sessions: Staying logged in avoids 2FA prompts and CAPTCHAs on repeat visits (Pro+)
Check screenshots: If a task fails, the AI often takes a screenshot showing what went wrong

Task Scheduler & Workflows

Creating Tasks

Scheduled tasks let AumaTron run automations on a recurring basis without your involvement.

Go to Settings → Scheduler
Click New Task
Enter a name and the prompt (natural language instruction for the AI)
Set the schedule (daily, weekly, or custom cron expression)
Optionally select a specific AI provider for this task
Save and enable the task

Example task prompts

"Log in to my Shopify store, go to Orders, and write today's order count to orders-log.txt"
"Go to the receipts folder, OCR all new files, extract the totals, and append them to expense-summary.txt"
"Check my WordPress site for pending plugin updates and send a summary"

Free plan: Up to 5 scheduled tasks. Pro+: Unlimited tasks.

Cron Scheduling

Tasks use cron-style scheduling. Common examples:

Schedule	Cron Expression	Meaning
Every day at 9 AM	`0 9 * * *`	Runs once daily at 9:00
Every Monday at 8 AM	`0 8 * * 1`	Weekly on Monday morning
Every 6 hours	`0 /6 * *`	Runs at midnight, 6 AM, noon, 6 PM
Every 30 minutes	`/30 * * *`	Runs twice per hour (Pro+)
Weekdays at 5 PM	`0 17 * * 1-5`	Mon–Fri at 5:00 PM

Advanced scheduling (minute and hour intervals) requires a Pro plan or higher.

Multi-Step Workflows

Workflows chain multiple tasks into a sequence that runs automatically. Each step executes in order, and the workflow loops until its time limit is reached.

Create a new task and enable Workflow mode
Add steps — each step is a prompt that the AI executes
Set a time limit for the overall workflow
Save and schedule it like any other task

Workflows are useful for multi-part jobs like: scan a folder for receipts → OCR each one → write a summary → archive processed files.

Task Chaining

Trigger a follow-up task automatically when another task completes. Pro+

Open an existing task
Under On Completion, select another task to trigger
The chained task starts as soon as the first one finishes

This lets you build pipelines: Task A collects data → Task B processes it → Task C sends a report.

Per-Task AI Models

Assign different AI providers and models to individual tasks. Pro+

This is useful for cost optimization:

Use a cheap/fast model (GPT-4o-mini, local Llama) for simple repetitive tasks
Use a powerful model (GPT-4o, Claude) for complex tasks that need reasoning
Use Deepseek for tasks where cost is the primary concern

Telegram Bot

Setup Guide

Open Telegram and search for @BotFather
Send /newbot and follow the prompts to create a bot
Copy the bot token that BotFather gives you
In AumaTron, go to Settings → General → Telegram
Paste the bot token and click Save
Send a message to your new bot in Telegram — AumaTron will respond

Available on all plans. Ultra plan supports multiple simultaneous Telegram bots.

Commands

You can send natural language messages to your bot just like chatting in the AumaTron web interface. The AI has full access to all tools — browser automation, file management, and OCR.

Example Telegram messages

"Check my Shopify orders and send me a screenshot"
"Read the file expense-summary.txt and tell me this month's total"
"Go to my WordPress site and check if there are any pending comments"

Remote Control

Telegram gives you full remote access to AumaTron from your phone. Some things you can do:

Browse websites and receive screenshots
Manage files in your workspace
Trigger automations on demand
Receive task completion notifications

Smart credential injection means that when you mention a saved site URL in Telegram, AumaTron automatically uses the stored login details.

File Management

Workspace Path

All file operations are scoped to your workspace folder. This is a security measure — the AI cannot read or write files outside this folder.

Set your workspace in Settings → General → Workspace Path.

Downloaded files: Files downloaded by the browser go to your OS default downloads folder (e.g. C:\Users\YourName\Downloads). AumaTron does not override the browser's download directory. To work with downloaded files, set your workspace to a parent folder that includes your Downloads folder.

File Browser Panel

Click Files in the header to open the file browser panel. It provides:

Directory navigation with breadcrumb trail
File preview (text files with syntax highlighting, images with preview)
File metadata (size, modification date, type)

The file browser is available on all plans for browsing and previewing.

File Operations

Full file management from the browser panel. Pro+

Create File — Create a new empty text file in the current folder
Create Folder — Create a new subfolder
Upload — Upload files from anywhere on your PC into the workspace (50 MB max per file)
Rename — Rename any file or folder
Edit — Open text files in an editor, make changes, and save
Delete — Delete files or folders (folders are deleted with all contents)

AI File Tools

The AI can manage files through chat commands. These tools are available when talking to the AI in chat, Telegram, or scheduled tasks:

read_file — Read the contents of a text file
write_file — Create a new file (auto-converts to append if file has content)
append_to_file — Add content to the end of a file
create_folder — Create a new directory
copy_file — Copy a file to a new location
move_file / rename_file — Move or rename files
delete_file — Delete a file
list_folder — List the contents of a directory

All file tools respect the workspace boundary and include safety guards to prevent accidental data loss.

OCR & Receipt Processing

Overview

AumaTron includes built-in OCR (Optical Character Recognition) powered by Tesseract.js. It can read text from images of receipts, invoices, documents, and more.

OCR is commonly used in scheduled tasks to automatically process receipt images, extract totals and dates, and write summaries to a log file.

Language Support

Built-in support for:

English — Full support
Thai — Full support including Thai numerals and Buddhist Era dates

OCR runs with both Thai and English language packs simultaneously for accurate results on bilingual documents.

Smart Extraction

After reading the text, AumaTron applies intelligent extraction:

Amount Detection (3-tier)

Keyword match — Looks for amounts near keywords like "Total", "Grand Total", "Net" (in English and Thai)
Repeated amounts — If the same amount appears multiple times, it's likely the total
Largest amount — Falls back to the largest reasonable number found

Date Detection

Recognizes multiple date formats:

DD/MM/YYYY, DD-MM-YY, DD Mon YYYY
Buddhist Era years (e.g. 2568 = 2025 CE) — auto-converted

Validation

Confidence threshold: OCR results below 40% confidence are auto-skipped
Amount validation: Rejects tracking numbers (>9 digits) and zero/unreasonable values
Results are cached per task run to prevent redundant processing

Best Practices

Good lighting: Well-lit, high-contrast receipt photos give the best results
Flat and straight: Avoid crumpled or angled receipts
Reasonable resolution: 300+ DPI or a clear phone photo works well
Use the OCR sidecar app: Install the Receipt OCR app from Settings → Apps for enhanced preprocessing (sharpening, contrast adjustment) that improves accuracy
Check the confidence: If OCR frequently produces garbage results, the image quality may be too low

Brain System

Personalities

Personalities shape how the AI behaves, its tone, focus areas, and response style.

Using Personalities

Create a .md file in the brain/personalities/ folder (or the brain/ root folder)
Write instructions for the AI's behavior (e.g. "You are a helpful assistant focused on e-commerce. Always be concise.")
Select the personality when starting a new conversation or assigning it to a scheduled task

Example personality file: brain/personalities/concise.md

You are a concise, efficient assistant.
- Keep responses under 3 sentences unless asked for detail
- Use bullet points for lists
- Skip pleasantries and get straight to the answer
- When browsing websites, extract only the specific data requested

Each conversation and each scheduled task can use a different personality.

Memory Files

Memory files give the AI persistent knowledge that it can recall based on context.

Create .md files in the brain/ folder
Write facts, preferences, or reference data the AI should know
The AI automatically retrieves relevant memories by matching keywords from your messages against memory file names and content

Example memory file: brain/shopify-stores.md

My Shopify stores:
- Main store: myshop.myshopify.com (Fashion, ~200 orders/day)
- Second store: myshop2.myshopify.com (Electronics, ~50 orders/day)
- Reports should be saved to the "shopify-reports" folder
- Always include order count and revenue in daily summaries

Token Budgets

The brain system automatically manages token usage to stay within provider limits:

Local AI	800 token budget
Cloud providers	4,000 token budget
Max memories per request	5
Min keyword length	4 characters

If the personality + memories exceed the budget, the system automatically truncates: first reducing personality to 50% of budget, then reducing to 2 memories, then dropping memories entirely.

Voice (STT & TTS)

Speech-to-Text

Provider	Plan	Description
Web Speech API	Free+	Built into your browser. No setup needed. Quality varies by browser.
Whisper Local	Pro+	OpenAI's Whisper model running locally. Private, no API costs.
Whisper Cloud	Pro+	Whisper via OpenAI API. Higher accuracy, requires OpenAI key.
Deepgram	Ultra	Real-time streaming STT with high accuracy and low latency.
Custom Endpoint	Ultra	Connect any STT API via custom endpoint URL.

Configure your STT provider in Settings → STT.

Text-to-Speech

Provider	Plan	Description
Browser TTS	Free+	Built-in browser speech synthesis. Works everywhere, basic quality.
ElevenLabs	Pro+	Premium AI voices with natural intonation. Requires ElevenLabs API key.
OpenAI TTS	Pro+	High-quality voices via OpenAI API. Multiple voice options.
Fish Audio	Pro+	Alternative TTS provider with unique voice options.
Custom Endpoint	Ultra	Connect any TTS API via custom endpoint URL.

Configure your TTS provider in Settings → TTS.

Apps & Custom API Tools

Installing Apps

Apps are sidecar services that extend AumaTron's capabilities.

Go to Settings → Apps
Enter the app's URL (e.g. http://localhost:3001)
AumaTron fetches the app's manifest to verify its capabilities
Click Install

Once installed, the app's endpoints are automatically registered as AI tools. For example, the Receipt OCR app becomes available as a tool the AI can call during tasks.

First-Party App: Receipt OCR Extractor

The Receipt OCR app provides enhanced receipt scanning with Sharp image preprocessing (sharpening, contrast, binarization) plus Tesseract OCR. It produces better results than the built-in OCR for difficult receipts.

Run it alongside AumaTron on port 3001. If the app goes down, AumaTron gracefully falls back to built-in OCR.

Custom API Tools

Create tools that let the AI call any external API. Pro+

Go to Settings → API Tools
Click New Tool
Configure:
- Name — Tool identifier (becomes api_yourname)
- API URL — The endpoint to call
- HTTP Method — GET, POST, PUT, DELETE
- Auth — None, Bearer token, API key header, or Basic auth
- Parameters — Define what the AI can pass to the API
Click Test Connection to verify it works
Save

The AI will automatically use the tool when relevant to your request. All credentials are encrypted with AES-256-GCM.

Security & Privacy

Encryption

AumaTron encrypts all sensitive data at rest:

Saved site passwords — AES-256-GCM encrypted in the database
API keys — AES-256-GCM encrypted
Custom API tool credentials — AES-256-GCM encrypted
Session tokens — JWT with signed secrets

The encryption key is generated automatically on first run and stored in your .env file. If you need to rotate keys, a migration script is included.

Website Filtering

Control which websites the AI can access. Pro+

Whitelist mode — Only allow specific domains
Blacklist mode — Block specific domains, allow everything else

Configure in Settings → General → Security.

Running Fully Offline

AumaTron can run with zero internet connectivity:

Use Local AI (Llama) as your AI provider
Use Web Speech API for voice (works offline in some browsers)
Use Browser TTS for text-to-speech
All data stays on your machine — no cloud calls needed

The only feature that requires internet is browser automation for external websites (obviously). Everything else — file management, OCR, scheduling, chat — works fully offline.

Plans & Billing

Plan Comparison

Feature	Free	Pro ($12/mo)	Ultra ($49/mo)
Scheduled tasks	5	Unlimited	Unlimited
Browser runs/month	100	Unlimited	Unlimited
AI providers	Local only	All + custom	All + custom
Browsers	Firefox	All (Firefox, Chrome, Tor)	All + headless
Concurrent tasks	2	5	Unlimited
File management	Browse only	Full (create, edit, upload, delete)	Full + batch ops
Task chaining	No	Yes	Yes
Voice (STT)	Web Speech	+ Whisper	+ Deepgram, custom
Voice (TTS)	Browser	+ ElevenLabs, OpenAI, Fish	+ Custom endpoints
Team members	1	1	Unlimited
Telegram	1 bot	1 bot	Multi-bot
VPN	No	1 profile	Multiple profiles
API access	No	No	REST API + webhooks
Support	Community	Email	Priority + SLA

Annual billing saves 17%: Pro is $120/year ($10/mo), Ultra is $490/year (~$40.83/mo).

Upgrading

Upgrade from Settings → General → Plan, or from the pricing page. Your data and configuration are preserved — no migration needed.

14-day money-back guarantee on Pro and Ultra
Cancel anytime — access continues until the end of the billing period
Downgrade to Free at any time — you keep your data, but gated features become unavailable

Troubleshooting

FAQ

Where do downloaded files go?

Files downloaded by the browser go to your operating system's default download folder (e.g. C:\Users\YourName\Downloads). AumaTron does not override the browser's download directory. To have the AI work with downloaded files, set your workspace path to a folder that includes your Downloads directory.

Can I use AumaTron without an internet connection?

Yes. Use Local AI (Llama) as your provider and all features except external website browsing work fully offline. OCR, file management, scheduling, and chat all work without internet.

Is my data private?

Completely. AumaTron runs on your machine. If you use Local AI, nothing ever leaves your device. If you use cloud providers (OpenAI, Claude, Deepseek), only your chat messages are sent to the provider's API. Your files, credentials, and database stay local.

How much faster are persistent sessions?

About 66% faster. A first-time login to a website takes approximately 9 steps. With a persistent session, subsequent visits take only 3 steps because the browser is already logged in.

Can I start free and upgrade later?

Yes. Start on the Free plan and upgrade whenever you need more AI providers, advanced scheduling, file management, or team features. No data migration needed — everything carries over.

What AI model should I use?

For simple repetitive tasks (checking order counts, reading files), a small local model or GPT-4o-mini works great and costs little or nothing. For complex tasks requiring reasoning (writing reports, multi-step workflows), use GPT-4o, Claude, or a larger local model.

Common Issues

Browser won't launch

Make sure no other instance of AumaTron's browser is running
Check that the browser engine is installed (Firefox for Playwright, Chrome for Puppeteer)
Try restarting AumaTron
On first run, Playwright may need to download Firefox — ensure internet connectivity

OCR returns garbage text

Check image quality — blurry or dark images produce poor results
Ensure the image is a common format (JPG, PNG)
If confidence is below 40%, AumaTron automatically skips the result
Try the OCR sidecar app for better preprocessing

Scheduled task runs but does nothing

Check the task prompt — be specific about what the AI should do
Verify that saved site credentials are correct
Check if the expected files/folders exist in the workspace
Review the task execution log for error messages

Port 3000 is already in use

Another application is using port 3000
Check if another AumaTron instance is running
Change the port in your .env file: PORT=3001

Local AI is slow

Increase GPU layers in Settings → AI Providers → Local AI (requires NVIDIA GPU)
Use a smaller model (Phi-3 Mini instead of Llama 70B)
Ensure no other GPU-intensive applications are running
Quantized models (Q4, Q5) are much faster than full-precision

"ENCRYPTION_KEY is required" error on startup

Your .env file is missing or doesn't contain an ENCRYPTION_KEY
On first run, this should be generated automatically
If your .env was deleted, you'll need to create a new one (see .env.example)
Warning: if you lose the old key, previously encrypted passwords can't be decrypted