Hey, this is Sadra! I’m currently studying PhD in Computer Science at USC, working at the intersection of Human-Computer Interaction (HCI) and Natural Language Processing (NLP), i.e., making LLMs better friends of humans. My current research focuses on helping people make better decisions using LLMs. On the side (and honestly, all the time), I build and maintain scientific software tools with a great team of open-source enthusiasts. I’m always looking for ways to make technology and science more accessible, and fun—believing that open-source software is an ideal contribution to scientific communities that value transparency and reproducibility. I enjoy watching movies and hunting for new places whenever I travel in my free time. I’m always curious to meet new people and hear about their journeys, so shoot me an email or DM me on any social media!
CS PhD @ USC ✌️
I'm currently in my second year and looking forward to exploring more domains to develop a taxonomy of these challenges and a framework that identifies the right interaction patterns and integration points for AI. Throughout this journey, I've had the great opportunity to work with the Adaptive Computing Experience (ACE) Lab (Souti Chattopadhyay’s lab @ GCS) and [CUTE LAB NAME] (Jonathan May’s lab @ ISI).
You can find some of my publications below:
[ICSE25] Trust dynamics in AI-assisted development: Definitions, factors, and implications, Sadra Sabouri, Philipp Eibl, Xinyi Zhou, Morteza Ziyadi, Nenad Medvidovic, Lars Lindemann, Souti Chattopadhyay
We investigate how developers define, evaluate, and evolve trust in AI-generated code suggestions through a mixed-method study involving surveys and observations. We found that while comprehensibility and perceived correctness are key to trust decisions, developers often revise their choices, accepting only 52% of AI suggestions, highlighting the need for better real-time support and offering four validated guidelines to improve developer-AI collaboration.
[ACL25] ELI-Why: Evaluating the Pedagogical Utility of Language Model Explanations, Brihi Joshi, Keyu He, Sahana Ramnath, Sadra Sabouri, Kaitlyn Zhou, Souti Chattopadhyay, Swabha Swayamdipta, Xiang Ren
We investigate how well language models adapt explanations to learners with varying educational backgrounds using ELI-Why, a benchmark of 13.4K "Why" questions. Through two human studies, we found that GPT-4 explanations align with intended grade levels only 50% of the time and are rated 20% less suitable for learners’ needs compared to layperson-curated responses, revealing limitations in their pedagogical adaptability.
Always happy to chat, collaborate, or just hear what you're working on; feel free to reach out!
Open World Developer 🌐
Below is a topic-based summary of my work, including those through OpenSciLab, dataset releases and independent projects:
Natural Language Processing and Large Language Models
Memor: Managing and Transferring Conversational Memory Across LLMs
Memor is designed to help users manage the memory of their interactions with Large Language Models (LLMs). It enables users to access and utilize the history of their conversations when prompting LLMs. That would create a more personalized and context-aware experience. Users can select specific parts of past interactions with one LLM and share them with another. By bridging the gap between isolated LLM instances, Memor revolutionizes the way users interact with AI by making transitions between models smoother.
[JAIAI] naab: A ready-to-use plug-and-play corpus for Farsi, Sadra Sabouri, Elnaz Rahmati, Soroush Gooran, Hossein Sameti
The issue of large training data is (was at that time :D) emerging more in lower resource languages - like Farsi. We propose naab a hue cleaned and ready-to-use open-source textual corpus in Farsi. It contains about 130GB of data, 250 million paragraphs, and 15 billion words. The project name is derived from the Farsi word NAAB which means pure and high grade.
[ALP@NAACL25] Parsipy: NLP toolkit for historical persian texts in Python, Farhan Farsi, Parnian Fazel, Sepand Haghighi, Sadra Sabouri, Farzaneh Goshtasb, Nadia Hajipour, Ehsaneddin Asgari, Hossein Sameti
The study of historical languages presents unique challenges due to their complex orthographic systems, fragmentary textual evidence, and the absence of standardized digital representations of text in those languages. This work introduces an NLP toolkit designed to facilitate the analysis of historical Persian languages by offering modules for tokenization, lemmatization, part-of-speech tagging, phoneme-to-transliteration conversion, and word embedding.
[LoResMT@NAACL25] PahGen: Generating Ancient Pahlavi Text via Grammar-guided Zero-shot Translation, Farhan Farsi, Parnian Fazel, Farzaneh Goshtasb, Nadia Hajipour, Sadra Sabouri, Ehsaneddin Asgari, Hossein Sameti
Due to Pahlavi (middle Persian)'s limited digital presence and the scarcity of comprehensive linguistic resources, Pahlavi is at risk of extinction. This study introduces a framework to translate English text into Pahlavi. Our approach combines grammar-guided term extraction with zero-shot translation, leveraging large language models (LLMs) to generate syntactically and semantically accurate Pahlavi sentences. Finally using our framework, we generate a novel dataset of 360 expert-validated parallel English-Pahlavi texts.
[DialDoc@ACL22] Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval, Sayed Hesam Alavian, Ali Satvaty, Sadra Sabouri, Ehsaneddin Asgari, Hossein Sameti
This paper discusses our proposed approach, Docalog, for the DialDoc-22 (MultiDoc2Dial) shared task which was part of my BSc. thesis. Docalog, has a three-stage pipeline consisting of (1) a document retriever model, (2) an answer span prediction model, and (3) an ultimate span picker deciding on the most likely answer span, out of all predicted spans.
Speech Processing
Nava: OS-Native Sound Engine in Python
Nava allows users to play sound in Python without any dependencies or platform restrictions. It is a cross-platform solution that runs on any operating system, including Windows, macOS, and Linux. Its lightweight and easy-to-use design makes Nava an ideal choice for developers looking to add sound functionality to their Python programs.
Sharif-Wav2Vec2.0: Wave2Vec2.0 Speech Processing Model Tailored for Farsi
The base model fine-tuned on 108 hours of Commonvoice's Farsi audio. Token set and the language models of that model changed to support special nuances of Farsi which wasn't there in English. More technically, we trained a 5gram using kenlm toolkit and used it in the processor which increased our accuracy on online ASR.
Machine Learning (ML)
PyCM: Multi-class confusion matrix library in Python
Network
PyRGG: Python Random Graph Generator
IPSpot: A Python Tool to Fetch the System's IP Address
Pymilo: A python library for ml I/O, AmirHosein Rostami, Sepand Haghighi, Sadra Sabouri, Alireza Zolanvari
PyMilo addresses the limitations of existing Machine Learning (ML) model storage formats by providing a transparent, reliable, and safe method for exporting and deploying trained models. Current formats, such as pickle and other binary formats, have significant problems, such as reliability, safety, and transparency issues. In contrast, PyMilo serializes ML models in a transparent non-executable format, enabling straightforward and safe model exchange.
Art
Samila: A Generative Art Generator, Sadra Sabouri, Sepand Haghighi, Elena Masrour
Samila lets you create images by randomly permuting many thousand points. The position of every single point is calculated by a formula, which has random parameters. Because of the randomness of the generation process you nearly can't reproduce any image unless you have the right seed for it. I highly encourage you to take a look at the paper if you're interested.
Human Computer Interaction (HCI)
Nafas: Breathing Gymnastics Application, Sadra Sabouri, Sepand Haghighi
mytimer: A Timer for Command Line Enthusiasts
Chemistry
Experimental dataset of electrochemical efficiency of a Direct Borohydride Fuel Cell (DBFC) with Pd/C, Pt/C and Pd decorated Ni–Co/rGO anode catalysts, Sarmin Hamidi, Sadra Sabouri, Sepand Haghighi, Kasra Askari
Dataset includes Direct Borohydride Fuel Cell (DBFC) impedance and polarization test in anode with Pd/C, Pt/C and Pd decorated Ni–Co/rGO catalysts. Voltage, power density and resistance of DBFC change as a function of weight percent of Sodium Borohydride (%), applied voltage and amount of anode catalyst loading that are evaluated by polarization and impedance curves with using appropriate equivalent circuit of fuel cell.
OPEM: Open Source PEM Fuel Cell Simulation Tool
The Open-Source PEMFC Simulation Tool (OPEM) is a modeling tool for evaluating the performance of proton exchange membrane fuel cells. This package is a combination of models (static/dynamic) that predict the optimum operating parameters of PEMFC. OPEM contained generic models that will accept as input, not only values of the operating variables such as anode and cathode feed gas, pressure and compositions, cell temperature and current density, but also cell parameters including the active area and membrane thickness.
Biomedical Science
OPR: Optimized Primer Design Tool
Civil Engineering
[AGU-WRR24] Representative sample size for estimating saturated hydraulic conductivity via machine learning: A proof‐of‐concept study, Amin Ahmadisharaf, Reza Nematirad, Sadra Sabouri, Yakov Pachepsky, Behzad Ghanbarian
Machine learning is widely used across disciplines, but hydrology has often overlooked the impact of data heterogeneity and sample size. In this study, we used ~18k soil samples from the USKSAT database to analyze how training size affects ML accuracy in estimating saturated hydraulic conductivity (Ks). Using XGBoost and repeated random subsets, we found that even with large datasets, learning and validation curves didn’t plateau.
News
Mar 2025: Python Software Foundation (PSF) granted our work, Nava library, for adding new OS-based sound engines, and integrating into notebooks.
Feb 2025: Nlnet granted our work, PyCM library, for a year through NGI0 Commons Fund for adding new features such as distance similarity matrix, data distribution analysis, hardware benchmarking of the library.
Jan 2025: My paper Trust dynamics in AI-assisted development: Definitions, factors, and implications got accepted into International Conference on Software Engineering (ICSE) 2025. I will present my work remotely in searly May.
Sep 2024: I was awarded a Trelis AI Grant for developing a RESTful API for PyCM, enhancing accessibility to machine learning statistical post-processing tools.
May 2024: Python Software Foundation (PSF) granted our work, ASCII Art library, for developing the library and add new features like multi-line arts, and supporting custom fonts.