16-18 oct. 2018 Yichang, China (France)


Text Data Science  is  Data Science  dedicated to specific data such as document and text material. It means usage and development of known techniques or new techniques of Data Science (classification, feature selection, curation, statistical approaches,visualization...) for text and document processing.

International Workshop of Text Data Science is organized during  I-SPAN 2018

15th International Symposium on Pervasive Systems, Algorithms and Networks

Big data, new mobile phone interfaces, simulation and databases are source of huge production of texts.
Even if text processing is not new, and computational linguistics proposed lots of interesting analytical frameworks, we face  to new challenges.

First challenge is practices of humanities and social scientists with new computing tools and data that come from the digital world and in particular to consider research objects (ie documents, entities, models, knowledge, rules, meta-data ...).
In most humanities and social sciences (HSS) disciplines the ground, practices and situations are strong frameworks.
Digital brings different hypotheses: virtuality, simulation, heterogeneous objects, mutualisation. Several questions arise for HSS specialists:
- What credit to databases and the accumulation of digital information?
- How to integrate digital objects into an analysis process in SHS?
- What knowledge can we expect from digital processing?
- How to convince by statistics and / or simulation?
- How to share your data? open data, myth or reality in HSS ?
There are many examples in which the data provide a new and relevant perspective on a discipline; as :
(1) the setting on line of literary archives and the correlation of the speech between two authors of two different periods
(2) Analysis of the social network of actors of medieval historical archives;
(3) The online publication of all scientific publications in a single database to extract the evolution of the themes
(4) the digital worlds on the internet are generating controversies and structuring economic life through dark web and microblogs.
(5) Language teaching and distance learning benefit from online behavior through MOOCs.
(6) written and oral corpora contribute to the analysis of language acquisition pathologies in clinical psychology, and make it possible to study neologies and the
evolution of languages for linguistics.

Second challenge is the case of social media information. The popularity and generation of social media data is confusing. It becomes a field of experimentation in the light of social science and communication
theories such as the treatment of false information, e-democracy or citizen behavior.
More than 300 giant social networks and media exists on internet. Several offers API for getting data in limited amount. This a new opportunity to study behavior of users
through their comments, text contained in video and exhcanged messages. Such information is not available as traditional well curated corpora and required new approaches, algorithms and frameworks for useful knowledge extraction.

The extraction of knowledge from texts offers great perspectives for the exploitation of multimedia data (text, sound and graphics).
In particular, we can expect contributions from information extraction, classification and machine learning techniques, heterogeneous cross-references, automatic indexing and
summarization, descriptive statistics and probabilistic analysis, visual data mining ; that requires adaptations to different dataset contexts and to user needs.


The objective of this workshop is to bring together specialists of text and document processing, humanities and social sciences (HSS) specialists and data scientists to discuss the contribution of algorithms to the textual data produced (mainly) by humans or HSS.


follow this link :



Machine Learning, Multivariate Data Analysis, Link and Network Analysis, Natural Language Processing, Information Retrieval, Big Data, Smart Data, Feature Selection, Knowledge Visualization, Complex Corpora, Curating Data, Multiparadigm Data Science Workflow, Sentiment Analysis, Recommendation Systems

Psychology , Sociology, History, Archaeology, Communication and Social Media, Digital Libraries, Literary Studies, Cultural and Heritage Studies, Didactics, Business Intelligence, Linguistics, Political Science, Cartography , Demography

Textual Data, Open data, Crowdsourcing

Text analysis and mining with applications to health and wellness, smart cities, energy, transport, environment, food, migration   issues


Organization committee

Nicolas              Turenne            UMR LISIS UPEM-INRA-CNRS                                         Paris           www 
nturenne  AT  u-pem.fr

Zhikui                Chen               Dalian University of Technology - School of Software         Dalian          www 

Scientific committee

Yanto                Chandra           City University of Hong Kong                                            Hong-Kong    www 

Kuang-hua        Chen               National Taiwan University                                               Taipei               www 

Qinran               Dang                INALCO                                                                            Paris               www 

Simon James    Fong               University of Macau                                                           Macau               www 

Natalia              Grabar               STL Université de Lille                                                     Lille                 www 

Serge                Heiden              UMR IHRIM ENS de Lyon                                                Lyon               www 

Telmo               Menezes            Berlin University, CMB                                                    Berlin               www 

Céline               Poudat             UMR BCL - Université de Nice                                         Nice                www 

Patrick              Paroubek         University Paris-Sud                                                        Paris                 www 

Shiming Simon  Zhang               Baidu.com                                                                      Shanghai          www 


The papers can be submitted via the Easychair website


Some publications selected by the scientific committee will be published (extended version and in English) in a special issue of the Journal of Data Mining and Digital Humanities (JDMDH). https://jdmdh.episciences.org/


Paper Submission Deadline
June 15, 2018 

Notification of acceptance to authors
July 15, 2018

Aug 15, 2018

Advanced Registration
Aug 20, 2018

Date of workshop
October 16,   2018


Yichang, China (Hubei province)

China Three Gorges University

Personnes connectées : 1