Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

About me

Posts

Future Blog Post

less than 1 minute read

Published:

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published:

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

publications

Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval

Published in Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, 2022

Information-seeking dialogue systems, including knowledge identification and response generation, aim to respond to users with fluent, coherent, and informative answers based on users’ needs. This paper discusses our proposed approach, Docalog, for the DialDoc-22 (MultiDoc2Dial) shared task. Docalog identifies the most relevant knowledge in the associated document, in a multi-document setting. Docalog, is a three-stage pipeline consisting of (1) a document retriever model (DR. TEIT),(2) an answer span prediction model, and (3) an ultimate span picker deciding on the most likely answer span, out of all predicted spans. In the test phase of MultiDoc2Dial 2022, Docalog achieved f1-scores of 36.07% and 28.44% and SacreBLEU scores of 23.70% and 20.52%, respectively on the MDD-SEEN and MDD-UNSEEN folds.

Recommended citation: Sayed Hesam Alavian, Ali Satvaty, Sadra Sabouri, Ehsaneddin Asgari, and Hossein Sameti. 2022. Docalog: Multi-document Dialogue System using Transformer-based Span Retrieval. In Proceedings of the Second DialDoc Workshop on Document-grounded Dialogue and Conversational Question Answering, pages 142–147, Dublin, Ireland. Association for Computational Linguistics. https://aclanthology.org/2022.dialdoc-1.16/

naab: A ready-to-use plug-and-play corpus for Farsi

Published in Journal of Artificial Intelligence, Applications and Innovations 1 (2), 1-8, 2022

Huge corpora of textual data are always known to be a crucial need for training deep models such as transformer-based ones. This issue is emerging more in lower resource languages - like Farsi. We propose naab, the biggest cleaned and ready-to-use open-source textual corpus in Farsi. It contains about 130GB of data, 250 million paragraphs, and 15 billion words. The project name is derived from the Farsi word NAAB K which means pure and high grade. We also provide the raw version of the corpus called naab-raw and an easy-to-use preprocessor that can be employed by those who wanted to make a customized corpus.

Recommended citation: Sabouri, Sadra, Elnaz Rahmati, Soroush Gooran, and Hossein Sameti. "naab: A ready-to-use plug-and-play corpus for Farsi." Journal of Artificial Intelligence, Applications and Innovations 1, no. 2 (2024): 1-8. https://arxiv.org/pdf/2208.13486

Trust dynamics in AI-assisted development: Definitions, factors, and implications

Published in International Conference on Software Engineering (ICSE) 2025, 2025

Software developers increasingly rely on AI code generation utilities. To ensure that “good” code is accepted into the code base and “bad” code is rejected, developers must know when to trust an AI suggestion. Understanding how developers build this intuition is crucial to enhancing developer-AI collaborative programming. In this paper, we seek to understand how developers (1) define and (2) evaluate the trustworthiness of a code suggestion and (3) how trust evolves when using AI code assistants. To answer these questions, we conducted a mixedmethod study consisting of an in-depth exploratory survey with (n= 29) developers followed by an observation study (n= 10). We found that comprehensibility and perceived correctness were the most frequently used factors to evaluate code suggestion trustworthiness. However, the gap in developers’ definition and evaluation of trust points to a lack of support for evaluating trustworthy code in real-time. We also found that developers often alter their trust decisions, keeping only 52% of original suggestions. Based on these findings, we extracted four guidelines to enhance developer-AI interactions. We validated the guidelines through a survey with (n= 7) domain experts and survey members (n= 8). We discuss the validated guidelines, how to apply them, and tools to help adopt them.

Recommended citation: Sabouri, Sadra, Philipp Eibl, Xinyi Zhou, Morteza Ziyadi, Nenad Medvidovic, Lars Lindemann, and Souti Chattopadhyay. "Trust dynamics in AI-assisted development: Definitions, factors, and implications." (2025). https://www.amazon.science/publications/trust-dynamics-in-ai-assisted-development-definitions-factors-and-implications

talks

Trustworthy AI Code Generation Talk at USC+Amazon 3rd Annual Symposium

Published:

In this talk, I present our team’s findings on trustworthy AI code generation at the USC-Amazon 3rd Annual Symposium. Our research focused on exploring the psychological trust dynamics between programmers and AI code assistants. We examined how developers perceive and rely on AI-generated code, the factors that influence their trust, and strategies for enhancing the reliability and transparency of AI tools by surveying 27 computer science students and an observational study over 10 software developers.

teaching

Teaching Assistant (DSCI552)

Graduate course, University of Southern California, Computer Science Department, 2024

DSCI552 - Machine Learning for Data Science

(Prof. Mohammad Reza Rajati)

I served as a teaching assistant for this course, where I was responsible for grading assignments, crafting exam questions, and holding regular office hours to support students. With 80 students enrolled, I worked alongside a team of 12 teaching assistants to ensure smooth course delivery.