International Workshop on Text Data Sciences

iwtds : International Workshop on Text Data Science

16-18 oct. 2018 Yichang, China (France)

CONTEXT

Text Data Science is Data Science dedicated to specific data such as document and text material. It means usage and development of known techniques or new techniques of Data Science (classification, feature selection, curation, statistical approaches,visualization...) for text and document processing.

International Workshop of Text Data Science is organized during I-SPAN 2018

15th International Symposium on Pervasive Systems, Algorithms and Networks
https://grid.chu.edu.tw/ispan2018/

Big data, new mobile phone interfaces, simulation and databases are source of huge production of texts.
Even if text processing is not new, and computational linguistics proposed lots of interesting analytical frameworks, we face to new challenges.

First challenge is practices of humanities and social scientists with new computing tools and data that come from the digital world and in particular to consider research objects (ie documents, entities, models, knowledge, rules, meta-data ...).
In most humanities and social sciences (HSS) disciplines the ground, practices and situations are strong frameworks.
Digital brings different hypotheses: virtuality, simulation, heterogeneous objects, mutualisation. Several questions arise for HSS specialists:
- What credit to databases and the accumulation of digital information?
- How to integrate digital objects into an analysis process in SHS?
- What knowledge can we expect from digital processing?
- How to convince by statistics and / or simulation?
- How to share your data? open data, myth or reality in HSS ?
There are many examples in which the data provide a new and relevant perspective on a discipline; as :
(1) the setting on line of literary archives and the correlation of the speech between two authors of two different periods
(2) Analysis of the social network of actors of medieval historical archives;
(3) The online publication of all scientific publications in a single database to extract the evolution of the themes
(4) the digital worlds on the internet are generating controversies and structuring economic life through dark web and microblogs.
(5) Language teaching and distance learning benefit from online behavior through MOOCs.
(6) written and oral corpora contribute to the analysis of language acquisition pathologies in clinical psychology, and make it possible to study neologies and the
evolution of languages for linguistics.

Second challenge is the case of social media information. The popularity and generation of social media data is confusing. It becomes a field of experimentation in the light of social science and communication
theories such as the treatment of false information, e-democracy or citizen behavior.
More than 300 giant social networks and media exists on internet. Several offers API for getting data in limited amount. This a new opportunity to study behavior of users
through their comments, text contained in video and exhcanged messages. Such information is not available as traditional well curated corpora and required new approaches, algorithms and frameworks for useful knowledge extraction.

The extraction of knowledge from texts offers great perspectives for the exploitation of multimedia data (text, sound and graphics).
In particular, we can expect contributions from information extraction, classification and machine learning techniques, heterogeneous cross-references, automatic indexing and
summarization, descriptive statistics and probabilistic analysis, visual data mining ; that requires adaptations to different dataset contexts and to user needs.

TOPICS OF INTEREST

Machine Learning, Multivariate Data Analysis, Link and Network Analysis, Natural Language Processing, Information Retrieval, Big Data, Smart Data, Feature Selection, Knowledge Visualization, Complex Corpora, Curating Data, Multiparadigm Data Science Workflow, Sentiment Analysis, Recommendation Systems

Psychology , Sociology, History, Archaeology, Communication and Social Media, Digital Libraries, Literary Studies, Cultural and Heritage Studies, Didactics, Business Intelligence, Linguistics, Political Science, Cartography , Demography

Textual Data, Open data, Crowdsourcing

Text analysis and mining with applications to health and wellness, smart cities, energy, transport, environment, food, migration issues

COMMITTEE

Organization committee

Nicolas Turenne UMR LISIS UPEM-INRA-CNRS Paris www
nturenne AT u-pem.fr

Zhikui Chen Dalian University of Technology - School of Software Dalian www

Scientific committee

Yanto Chandra City University of Hong Kong Hong-Kong www

Kuang-hua Chen National Taiwan University Taipei www

Qinran Dang INALCO Paris www

Simon James Fong University of Macau Macau www

Natalia Grabar STL Université de Lille Lille www

Serge Heiden UMR IHRIM ENS de Lyon Lyon www

Telmo Menezes Berlin University, CMB Berlin www

Céline Poudat UMR BCL - Université de Nice Nice www

Patrick Paroubek University Paris-Sud Paris www

Shiming Simon Zhang Baidu.com Shanghai www

Personnes connectées : 1