Abserny Core

Offline object detection system for visually impaired users

Introduction

Abserny (أبصِرني) is an offline object detection system designed specifically for visually impaired users. It combines voice recognition, computer vision, and text-to-speech to provide accessible environmental awareness through natural Arabic language interaction.

The system operates entirely offline after initial setup, ensuring complete user privacy and reliability without internet dependency.

Key Features

  • Voice-activated detection using Arabic trigger words
  • Real-time YOLOv8-based object detection
  • Natural Arabic language descriptions
  • Complete offline operation
  • Cross-platform support (Windows, macOS, Linux)
  • Privacy-first architecture

Installation

System requirements:

  • Python 3.8 or higher
  • 4GB RAM minimum (8GB recommended)
  • Webcam (built-in or USB)
  • Microphone (built-in or external)
  • 2GB free disk space

Installation Steps

Clone the repository:

git clone https://github.com/yourusername/abserny.git
cd abserny

Install dependencies:

pip install -r requirements.txt

Download required models:

python download_models.py

Quick Start

Launch the application:

python main.py

The system will initialize and begin listening for Arabic voice commands. Simply say one of the trigger words to activate object detection.

Voice Control

Abserny responds to the following Arabic trigger words:

Trigger Words

  • ابدأ (Ibda') - Start detection
  • اكتشف (Iktashif) - Detect objects
  • شوف (Shuf) - See what's there
  • انظر (Undhur) - Look
  • امسح (Imsah) - Scan area

How It Works

When a trigger word is detected:

  1. System captures current camera frame
  2. YOLOv8 processes the image
  3. Objects are identified and located
  4. Natural Arabic description is generated
  5. Description is spoken through TTS

Object Detection

The system uses YOLOv8 (You Only Look Once version 8) for real-time object detection. The nano variant (yolov8n) is used by default for optimal balance between speed and accuracy.

Detection Process

The detection pipeline includes:

  • Frame capture and preprocessing
  • Object detection inference
  • Confidence filtering
  • Natural language generation
  • Speech synthesis

Supported Objects

The system can detect 80+ common objects including:

  • People and body parts
  • Furniture and household items
  • Electronics and devices
  • Vehicles
  • Animals
  • Food and beverages

Offline Mode

All processing happens locally on your device:

Components

  • Vosk - Offline speech recognition
  • YOLOv8 - Object detection model
  • pyttsx3 - Text-to-speech synthesis

Privacy

No data is transmitted to external servers. All voice processing, object detection, and speech synthesis occur entirely on your local machine.

Settings

Configuration is managed through config.yaml:

camera:
  device_id: 0
  resolution: [640, 480]
  fps: 30

detection:
  confidence_threshold: 0.5
  model_path: "models/yolov8n.pt"
  max_detections: 10

speech:
  language: "ar"
  rate: 150
  volume: 0.8

recognition:
  model_path: "models/vosk-model-ar"
  trigger_words:
    - "ابدأ"
    - "اكتشف"
    - "شوف"
    - "انظر"
    - "امسح"

Customization

Camera Settings

Adjust camera resolution and FPS based on your hardware:

camera:
  resolution: [1280, 720]  # HD
  fps: 15  # For slower systems

Detection Tuning

Modify confidence threshold to balance detection sensitivity:

detection:
  confidence_threshold: 0.6  # Higher = fewer but more confident detections

Speech Customization

Adjust speech rate and volume:

speech:
  rate: 120  # Slower speech
  volume: 1.0  # Maximum volume

API Reference

Use Abserny programmatically in your applications:

Basic Usage

from abserny import Detector

# Initialize detector
detector = Detector()

# Detect objects in image
results = detector.detect('image.jpg')

# Process results
for obj in results:
    print(f"{obj.name}: {obj.confidence:.2f}")

Voice-Activated Mode

from abserny import VoiceDetector

# Initialize
detector = VoiceDetector(language='ar')

# Define callback
@detector.on_detection
def handle_detection(results):
    for obj in results:
        print(f"Found: {obj.name}")

# Start listening
detector.start_listening()
detector.run()

Troubleshooting

Camera Issues

Problem: Camera not detected

  • Check camera permissions in system settings
  • Ensure camera is not in use by another application
  • Try different camera.device_id values (0, 1, 2)

Voice Recognition Issues

Problem: Trigger words not recognized

  • Verify microphone permissions
  • Check microphone volume levels
  • Reduce background noise
  • Speak clearly at normal pace

Performance Issues

Problem: Slow detection

  • Lower camera resolution to 320x240
  • Ensure you're using yolov8n (nano) model
  • Close other resource-intensive applications
  • Increase frame_skip in configuration

Model Download Issues

Problem: Models fail to download

  • Check internet connection
  • Download models manually from documentation
  • Verify models directory exists