Papers
Topics
Authors
Recent
Search
2000 character limit reached

Migrant Voices, Local News: Insights on Bridging Community Needs with Media Content

Published 17 Apr 2026 in cs.CL | (2604.16651v1)

Abstract: Research shows news consumption differs across demographics, yet little is known about non-mainstream audiences, especially in relation to local media. Our study addresses this gap by examining how French-speaking migrants in a mid-size European city engage with local news, and whether their needs are reflected in coverage. Eight community members participated in focus groups, whose insights guided the selection of natural language processing methods (topic modeling, information retrieval, sentiment analysis, and readability) applied to over 2000 hyper-local news articles. Results showed that while articles frequently covered local events, gaps remained in topics important to participants. Sentiment analysis revealed a generally positive tone, and readability measures indicated an intermediate-advanced French level, raising questions about accessibility for integration. Our work contributes to bridging the gap between local news platforms' content and diverse readers' needs, and could inform local media organizations about opportunities to expand their current news story coverage to appeal to more diverse audiences.

Summary

  • The paper's main contribution is integrating community-derived qualitative insights with computational NLP to expose content gaps between local news and migrant needs.
  • Using mixed-methods, it quantitatively evaluates sentiment, topic coverage, and readability of 2,666 French news articles to reveal underrepresentation in key thematic areas.
  • The findings advocate for dynamic, community-responsive content curation and inclusive communication strategies, suggesting actionable pathways for media reform.

Bridging Community Needs with Local Media: A Mixed-Methods Computational Analysis of Migrant News Consumption

Introduction and Problem Formulation

The paper "Migrant Voices, Local News: Insights on Bridging Community Needs with Media Content" (2604.16651) addresses the persistent deficiency in understanding how local news ecosystems align with the informational and representational needs of migrant communities, an especially pertinent issue in urban European contexts with significant migrant populations. Extant literature documents both the underrepresentation and misrepresentation of migrants in media, as well as the role of local journalism in fostering civic belonging; however, there has been insufficient integration of computational analysis and bottom-up community engagement to systematically quantify these gaps or alignments.

This research employs a mixed-methods design, integrating qualitative focus group data with quantitative NLP applied to 2,666 hyper-local French-language newspaper articles. The central research questions probe: (1) migrant community needs, sentiments, and consumption patterns regarding local news, and (2) the extent to which these needs are reflected in actual hyper-local media coverage, as revealed through topic modeling, information retrieval, sentiment analysis, and text readability assessments.

Research Methodology

The methodology explicitly anchors computational analysis in community-derived themes, ensuring qualitative insights are not decoupled from quantitative findings. The research process involves two gender-stratified focus groups, designed to surface nuanced perspectives regarding intersectional experiences of news consumption among migrants. Detailed qualitative coding identifies emergent topics, sentiment toward existing media, and perceptions of accessibility.

These qualitative themes inform an NLP pipeline encompassing:

  • Topic Modeling: BERTopic on contextual (MiniLM) embeddings, with UMAP and HDBSCAN for high-granularity clustering adapted to French-language data.
  • Information Retrieval: BERT-based semantic search for coverage of salient focus group-identified topics at a pragmatic cosine similarity threshold.
  • Sentiment Analysis: VADER (adapted for French) is used at the title, subtitle, and article-body levels to extract polarity distributions.
  • Text Readability: CEFR classification (A1–C2, merged at C level) using a CamemBERT model fine-tuned for L2 French proficiency to assess accessibility for language learners. Figure 1

    Figure 1: End-to-end workflow integrating focus group-guided theme extraction with sequential NLP analysis on the target news corpus, culminating in an aligned synthesis.

Corpus Statistics and Temporal Distribution

The news article dataset contains 2,666 articles over a four-year span, extracted with structural meta-data enabling stratification by theme and time. Analysis of publication distribution reveals event-driven density as well as categorical imbalances. Figure 2

Figure 2: Article publication dynamics and thematic category distributions over time, reflecting event-driven spikes and a dominant "Going Out" category.

Key Qualitative Insights: Migrant News Consumption and Needs

Participants cited local news as instrumental for both linguistic acquisition and socio-cultural integration, yet highlighted considerable barriers: language complexity, content irrelevancy (especially for youth or specific intersectional groups), and a surplus of emotionally negative coverage that induces selective avoidance. Notably, participants did not universally report misrepresentation in hyper-local media—contradicting broader literature on mainstream media, but articulated significant thematic gaps: limited coverage of sports, nature, humanitarianism, migration, and generational interests.

A recurring preference emerged for investigative journalism and viewpoint diversity, with skepticism about both the neutrality of traditional outlets and the reliability of social networks. The importance of readability and clear audience targeting (e.g., labeling content for different age demographics) was underscored.

Computational Findings and Alignment with Community Needs

Topic Modeling

Topic modeling yielded nine coherent themes, with outliers, primarily focused on leisure, culture, local history, domestic issues (including sensitive topics), and city infrastructure. There was a lack of substantial topic clusters on migration, humanitarianism, or contemporary youth concerns—validating participants’ claims of coverage gaps.

Information Retrieval

BERT-based similarity search revealed quantifiable deficits: only 34 articles matched ‘sport’, 36 ‘Africa’, and 38 ‘feminism’, while ‘inequality’ yielded 191 results and ‘migration’ 82. Several focus-group-identified topics appeared only marginally in the corpus. For nuanced local issues (e.g., neighborhood centers, mobility controversies), direct keyword search found 19 relevant articles each—indicating minimal sustained coverage despite reported local salience.

Sentiment Analysis

Contrary to participants' perceptions of news negativity, sentiment analysis demonstrated overall higher positive than negative polarity across all text granularities. Title and subtitle sentiment magnitudes exceeded those in article bodies, reflecting established journalism practices in attention-capture. This raises questions regarding cognitive biases in news perception and potential opportunities for media organizations to communicate their positive editorial bias more effectively. Figure 3

Figure 3: Log-based distribution of sentiment polarity by article segment; positive sentiment systematically outweighs negative sentiment.

Figure 4

Figure 4: Relative share of high polarity scores by segment; titles and subtitles systematically exhibit higher sentiment than article bodies, both positive and negative.

Readability Analysis

Readability assessment classified the majority of articles at B1 (intermediate) or C (advanced) CEFR levels, with fewer than 10% at beginner levels (A1/A2). This implies that the corpus is not accessible to early-stage L2 French learners and may be a significant exclusionary barrier for some migrants. For fully functional access to the newspaper, at least an upper-intermediate French proficiency is required. Figure 5

Figure 5: CEFR-level distribution of article readability, indicating predominance of intermediate-to-advanced complexity.

Implications and Theoretical Significance

The findings validate the efficacy of a mixed-methods, audience-centered computational journalism approach for both surfacing content gaps and rectifying misalignments between editorial strategy and community needs. The explicit lack of coverage on salient migrant-identified themes quantifies editorial blind spots not apparent from superficial corpus analysis, aligning with claims in the literature on representational disparities in local journalism. The divergence between user-perceived negativity and the objectively measured positive news sentiment exposes perceptual biases and carries implications for editorial transparency and news literacy.

From a systems design perspective, these results argue for dynamic, community-responsive content curation, improved readability stratification, and inclusive topic sourcing via participatory or co-production mechanisms. Additionally, the hybrid pipeline presented here demonstrates an actionable framework for ongoing audience-aligned content audits in local journalism.

Speculation on Future AI Developments

This research trajectory could be expanded through:

  • Active-learning based topic modeling that iteratively incorporates community feedback for theme granularity adjustment.
  • Multimodal sentiment/engagement analytics, including integration of social media metadata and in situ user interaction data.
  • Adaptive readability personalization targeting L2 proficiency identification and live document simplification.
  • Recommender systems for local news platforms explicitly tuned to underrepresented audience needs, leveraging weakly supervised NLU approaches for low-signal topic discovery.
  • Longitudinal studies across multiple media outlets to produce comparative alignment measures at regional or national scales.

Conclusion

This work presents a rigorous, empirically grounded methodology for diagnosing and quantifying the degree of alignment between hyper-local media content and the informational needs of migrant communities. By synthesizing qualitative focus group inquiry with a multi-pronged computational NLP pipeline, the authors provide substantive evidence of thematic coverage gaps, overestimated negativity perceptions, and accessibility barriers. The implications extend to both the design of empathetic, inclusive news systems and the theoretical modeling of audience–media alignment, establishing a baseline for interventions in computational journalism and participatory media analytics.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.