Abserny Core

Offline object detection system for visually impaired users

Introduction

Abserny (أبصِرني) is an offline object detection system designed specifically for visually impaired users. It combines voice recognition, computer vision, and text-to-speech to provide accessible environmental awareness through natural Arabic language interaction.

The system operates entirely offline after initial setup, ensuring complete user privacy and reliability without internet dependency.

Key Features

Voice-activated detection using Arabic trigger words
Real-time YOLOv8-based object detection
Natural Arabic language descriptions
Complete offline operation
Cross-platform support (Windows, macOS, Linux)
Privacy-first architecture

Installation

System requirements:

Python 3.8 or higher
4GB RAM minimum (8GB recommended)
Webcam (built-in or USB)
Microphone (built-in or external)
2GB free disk space

Installation Steps

Clone the repository:

git clone https://github.com/yourusername/abserny.git
cd abserny

Install dependencies:

pip install -r requirements.txt

Download required models:

python download_models.py

Quick Start

Launch the application:

python main.py

The system will initialize and begin listening for Arabic voice commands. Simply say one of the trigger words to activate object detection.

Voice Control

Abserny responds to the following Arabic trigger words:

Trigger Words

ابدأ (Ibda') - Start detection
اكتشف (Iktashif) - Detect objects
شوف (Shuf) - See what's there
انظر (Undhur) - Look
امسح (Imsah) - Scan area

How It Works

When a trigger word is detected:

System captures current camera frame
YOLOv8 processes the image
Objects are identified and located
Natural Arabic description is generated
Description is spoken through TTS

Object Detection

The system uses YOLOv8 (You Only Look Once version 8) for real-time object detection. The nano variant (yolov8n) is used by default for optimal balance between speed and accuracy.

Detection Process

The detection pipeline includes:

Frame capture and preprocessing
Object detection inference
Confidence filtering
Natural language generation
Speech synthesis

Supported Objects

The system can detect 80+ common objects including:

People and body parts
Furniture and household items
Electronics and devices
Vehicles
Animals
Food and beverages

Offline Mode

All processing happens locally on your device:

Components

Vosk - Offline speech recognition
YOLOv8 - Object detection model
pyttsx3 - Text-to-speech synthesis

Privacy

No data is transmitted to external servers. All voice processing, object detection, and speech synthesis occur entirely on your local machine.

Settings

Configuration is managed through config.yaml:

camera:
  device_id: 0
  resolution: [640, 480]
  fps: 30

detection:
  confidence_threshold: 0.5
  model_path: "models/yolov8n.pt"
  max_detections: 10

speech:
  language: "ar"
  rate: 150
  volume: 0.8

recognition:
  model_path: "models/vosk-model-ar"
  trigger_words:
    - "ابدأ"
    - "اكتشف"
    - "شوف"
    - "انظر"
    - "امسح"

Customization

Camera Settings

Adjust camera resolution and FPS based on your hardware:

camera:
  resolution: [1280, 720]  # HD
  fps: 15  # For slower systems

Detection Tuning

Modify confidence threshold to balance detection sensitivity:

detection:
  confidence_threshold: 0.6  # Higher = fewer but more confident detections

Speech Customization

Adjust speech rate and volume:

speech:
  rate: 120  # Slower speech
  volume: 1.0  # Maximum volume

API Reference

Use Abserny programmatically in your applications:

Basic Usage

from abserny import Detector

# Initialize detector
detector = Detector()

# Detect objects in image
results = detector.detect('image.jpg')

# Process results
for obj in results:
    print(f"{obj.name}: {obj.confidence:.2f}")

Voice-Activated Mode

from abserny import VoiceDetector

# Initialize
detector = VoiceDetector(language='ar')

# Define callback
@detector.on_detection
def handle_detection(results):
    for obj in results:
        print(f"Found: {obj.name}")

# Start listening
detector.start_listening()
detector.run()

Troubleshooting

Camera Issues

Problem: Camera not detected

Check camera permissions in system settings
Ensure camera is not in use by another application
Try different camera.device_id values (0, 1, 2)

Voice Recognition Issues

Problem: Trigger words not recognized

Verify microphone permissions
Check microphone volume levels
Reduce background noise
Speak clearly at normal pace

Performance Issues

Problem: Slow detection

Lower camera resolution to 320x240
Ensure you're using yolov8n (nano) model
Close other resource-intensive applications
Increase frame_skip in configuration

Model Download Issues

Problem: Models fail to download

Check internet connection
Download models manually from documentation
Verify models directory exists