Abserny Core
Introduction
Abserny (أبصِرني) is an offline object detection system designed specifically for visually impaired users. It combines voice recognition, computer vision, and text-to-speech to provide accessible environmental awareness through natural Arabic language interaction.
The system operates entirely offline after initial setup, ensuring complete user privacy and reliability without internet dependency.
Key Features
- Voice-activated detection using Arabic trigger words
- Real-time YOLOv8-based object detection
- Natural Arabic language descriptions
- Complete offline operation
- Cross-platform support (Windows, macOS, Linux)
- Privacy-first architecture
Installation
System requirements:
- Python 3.8 or higher
- 4GB RAM minimum (8GB recommended)
- Webcam (built-in or USB)
- Microphone (built-in or external)
- 2GB free disk space
Installation Steps
Clone the repository:
git clone https://github.com/yourusername/abserny.git
cd abserny
Install dependencies:
pip install -r requirements.txt
Download required models:
python download_models.py
Quick Start
Launch the application:
python main.py
The system will initialize and begin listening for Arabic voice commands. Simply say one of the trigger words to activate object detection.
Voice Control
Abserny responds to the following Arabic trigger words:
Trigger Words
- ابدأ (Ibda') - Start detection
- اكتشف (Iktashif) - Detect objects
- شوف (Shuf) - See what's there
- انظر (Undhur) - Look
- امسح (Imsah) - Scan area
How It Works
When a trigger word is detected:
- System captures current camera frame
- YOLOv8 processes the image
- Objects are identified and located
- Natural Arabic description is generated
- Description is spoken through TTS
Object Detection
The system uses YOLOv8 (You Only Look Once version 8) for real-time object detection. The nano variant (yolov8n) is used by default for optimal balance between speed and accuracy.
Detection Process
The detection pipeline includes:
- Frame capture and preprocessing
- Object detection inference
- Confidence filtering
- Natural language generation
- Speech synthesis
Supported Objects
The system can detect 80+ common objects including:
- People and body parts
- Furniture and household items
- Electronics and devices
- Vehicles
- Animals
- Food and beverages
Offline Mode
All processing happens locally on your device:
Components
- Vosk - Offline speech recognition
- YOLOv8 - Object detection model
- pyttsx3 - Text-to-speech synthesis
Privacy
No data is transmitted to external servers. All voice processing, object detection, and speech synthesis occur entirely on your local machine.
Settings
Configuration is managed through config.yaml:
camera:
device_id: 0
resolution: [640, 480]
fps: 30
detection:
confidence_threshold: 0.5
model_path: "models/yolov8n.pt"
max_detections: 10
speech:
language: "ar"
rate: 150
volume: 0.8
recognition:
model_path: "models/vosk-model-ar"
trigger_words:
- "ابدأ"
- "اكتشف"
- "شوف"
- "انظر"
- "امسح"
Customization
Camera Settings
Adjust camera resolution and FPS based on your hardware:
camera:
resolution: [1280, 720] # HD
fps: 15 # For slower systems
Detection Tuning
Modify confidence threshold to balance detection sensitivity:
detection:
confidence_threshold: 0.6 # Higher = fewer but more confident detections
Speech Customization
Adjust speech rate and volume:
speech:
rate: 120 # Slower speech
volume: 1.0 # Maximum volume
API Reference
Use Abserny programmatically in your applications:
Basic Usage
from abserny import Detector
# Initialize detector
detector = Detector()
# Detect objects in image
results = detector.detect('image.jpg')
# Process results
for obj in results:
print(f"{obj.name}: {obj.confidence:.2f}")
Voice-Activated Mode
from abserny import VoiceDetector
# Initialize
detector = VoiceDetector(language='ar')
# Define callback
@detector.on_detection
def handle_detection(results):
for obj in results:
print(f"Found: {obj.name}")
# Start listening
detector.start_listening()
detector.run()
Troubleshooting
Camera Issues
Problem: Camera not detected
- Check camera permissions in system settings
- Ensure camera is not in use by another application
- Try different
camera.device_idvalues (0, 1, 2)
Voice Recognition Issues
Problem: Trigger words not recognized
- Verify microphone permissions
- Check microphone volume levels
- Reduce background noise
- Speak clearly at normal pace
Performance Issues
Problem: Slow detection
- Lower camera resolution to 320x240
- Ensure you're using yolov8n (nano) model
- Close other resource-intensive applications
- Increase
frame_skipin configuration
Model Download Issues
Problem: Models fail to download
- Check internet connection
- Download models manually from documentation
- Verify models directory exists