Datascientist

il y a 3 jours


Tunis Berges du Lac Tunisia Clusterlab Temps plein 80 000 $US - 120 000 $US par an
We're looking for a Data Scientist with strong fundamentals in machine learning and a deep interest in voice technology. This role focuses on building and fine-tuning voice-related models from scratch — including speech-to-text, speaker diarization, audio classification, and LLM-integrated speech systems. You'll work across the full stack of data science: from data collection and curation to model developmentevaluation, and production deployment.

Lead the development of custom models for speech recognition, transcription, audio segmentation, and speaker identification.

Build robust data pipelines: collecting, preprocessing, cleaning, and labeling large audio datasets.

Fine-tune and evaluate state-of-the-art open-source models (e.g., Whisper, wav2vec, HuBERT, Conformer) on proprietary datasets.

Design experiments and benchmark models for quality, latency, and domain adaptability.

Work closely with product teams to embed voice capabilities into real-time applications (e.g., live summarization, AI agents, call insights).

Maintain scalable training, evaluation, and inference workflows using modern ML tooling (e.g., PyTorch, Hugging Face, Weights & Biases).

Contribute to internal knowledge sharing and best practices around audio ML.



RequirementsRequirements
  • Strong experience with speech or audio ML: speech recognition, speaker diarization, voice activity detection, etc.

  • Hands-on experience in building models from scratch and fine-tuning large models.

  • Deep understanding of signal processing, feature extraction, and data augmentation for audio.

  • Proficient in Python and common ML libraries: PyTorch, NumPy, Scikit-learn, Hugging Face.

  • Familiarity with end-to-end ML pipelines: data cleaning, training, tuning, evaluation, and serving.

  • Comfort with using cloud platforms (GCP, AWS) and containerized environments.

  • High agency and comfort working in fast-paced, ambiguous environments.


Nice to Have
  • Experience with LLM + speech integration (e.g., Whisper + GPT pipelines).

  • Knowledge of real-time systems or streaming inference.

  • Understanding of multilingual ASR challenges and dialect modeling (Arabic dialects a plus).

  • Experience with tools like DVC, MLflow, or W&B for experiment tracking.



Benefits
  • The chance to build domain-defining voice technology from scratch.

  • Exposure to real-world deployments and rapid iteration cycles.

  • Mentorship and collaboration with a team of high-agency engineers and researchers.

  • Flexible, remote-friendly work culture centered around ownership and outcomes.



  • Datascientist

    il y a 3 jours


    Tunis, Tunisia The Quantic Factory Temps plein 12 000  - 15 000  par an

    Rattaché.e à la Direction Scientifique, vous aurez à votre charge tous les aspects data analytics pour fournir à nos clients des outils permettant de mieux comprendre leurs consommateurs, ainsi que d'évaluer la performance et les impacts business de leurs actions marketing.Vous serez notamment amené à :Certifier l'ensemble des résultats chiffrés...