//for young scholars


The training courses will give young scholars the opportunity to acquire knowledge about computational methods and to develop skills to apply these methods within their research. At first, all workshop participants will have the opportunity to take part in a training course for the program R. Afterwards five parallel method training courses will cover different methods of data access, management, and analyses.

New data sets require new methods

Big data differs from conventional data and requires specific competencies which are currently not part of the academic training in most social science and communication departments. Collaborations especially with computer scientists are vital. However, social scientists must acquire a basic understanding on their own in order to be able to ask meaningful research questions and implement intelligent research designs. At the moment, knowledge and skills in CSS are not yet widespread among communication scholars.

Equipping communication scholars for the computational social science

To equip communication scholars for the computational social science area, five parallel training courses will provide hands-on-training in domains critical for including CSS methods in all parts of the research process. First, a training courses (1) will deal with possibilities to acquire large amounts of data by using the application programming interface (API) of a website or working with scraping methods. Second, three courses focus on procedures for approaching large data sets including (2) automated content analysis techniques, (3) topic modelling and (4) social network analysis. Third, a training course (5) will provide knowledge and skills on statistical techniques to analyse large data sets.


You can now find the full program with scheduled times and locations on the page “PROGRAM”.
Please note, that the times may be subject to change.

Find the program here


Learn R

1-day introductory course to the lingua franca of data science

R is becoming a more and more popular software for data analysis. Due to its open source nature and active community, a large pool of free extensions for CSS methods has been proposed. Hence, learning R is a fruitful basis for all following courses.

This course offers a hands-on introduction to R and RStudio. We will go through first steps such as reading data, accessing R packages, understanding different data structures, and organizing and commenting R code. Furthermore, this course will cover practical issues in statistical computing such as data wrangling, variable computing, string manipulation, and working with functions and loops. Finally, we will do some basic statistics (descriptive and regression analyses) and practice how to integrate results from different analyses.

We recommend attending this course if you have no or little experience in R and want to participate in one of the subsequent method training courses.

Trainer Philipp Masur
Trainer // Philipp Masur
Philipp Masur is research assistant at the Department of Media Psychology at the University of Hohenheim (Stuttgart). He has studied communication science, economics, and philosophy at the Johannes-Gutenberg University of Mainz and the Macquarie University of Sydney (Australia). He has worked at PRIME Research in Mainz and at the department of politics and current affairs of the ZDF. In his research he focusses on different aspects of computer-mediated communication and media usage patterns. Specifically, he is interested in the psychological experience of privacy.

Parallel Training Courses

Network Analysis
The rising prominence of social network analysis (SNA) has been mirrored in the development of specialized tools and computer programs for various kinds of networks. This general trend has been enhanced by the current data revolution. Innovative methods to study those networks are often developed in an R-framework. The workshop introduces various R-packages on SNA and enables participants to construct, analyze and visualize network data. First, we will concentrate on the different logic of each package in terms of graph initialization, their general advantages and disadvantages, and how to overcome those differences. After practicing how to treat network data in R, we will focus on the utilization of a variety of network measures describing both actor positions and whole networks. Additionally, there exist many build-in algorithms for community detection and network evolution that can be easily applied after the first steps. The second main part involves more advanced techniques, especially Exponential Random Graph and Siena Models. Both are cutting-edge methods to infer causal mechanisms in social networks. After a short explanation of their mathematical and theoretical intuition, we will apply them on multiple empirical examples.
Trainer // Raphael Heiberger
Raphael Heiberger wrote his dissertation comparing the financial systems in the U.S. and Germany from a sociological perspective at the University of Bamberg (supervisor: Richard Münch). He is currently a Postdoc at the University of Bremen and has recently been a visiting scholar at UC Berkeley (host: Neil Fligstein). Besides Economic Sociology, his research interests focus on Social Network Analysis, Natural Language Processing, Bayesian Statistics (esp. Machine Learning) and Programming (esp. R & Python).
Statistical Analysis of Large Data Sets
The aim of the workshop is to provide participants with theoretical and more practical tools that are useful –if not necessary– to deal with existing or to-be-built large datasets in the field of communication studies. The idea underlying the workshop is that –by asking theory-driven questions to well-constructed datasets– researchers can avoid drowning in the “curse of dimensionality” of those datasets and instead use them to advance the frontier of knowledge in the field. Here is the sequence of topics that will be covered:
- gathering data from online archives, or finding existing ones that were left unexplored
- devising feasible measurement strategies for the objects of interest (with a specific focus on political bias)
- thinking about empirical strategies that take seriously the caveat that “correlation does not imply causation”
- communicating those results to the public at large

Knowledge of panel data analysis (in particular fixed effects regressions) is required. In contrast to the other courses Stata will be used as a statistical package to show examples, routines etc.; Prior knowledge of Stata is not a prerequisite for course attendance, but basic knowledge on writing programs for a statistical package – such as R, SPSS and the like - is required.

Trainer // Riccardo Puglisi
Riccardo Puglisi is an associate professor in economics at the University of Pavia (Italy). He holds a doctorate in public finance (Pavia) and a PhD in Economics (LSE). His research is focused on the political and economic role of mass media, public opinion formation and public finance, and has been published on top and top-field journals in economics and political science, such as the Journal of the European Economic Association, the Journal of Politics, the Journal of Public Economics, and the Quarterly Journal of Political Science. He is a columnist for Corriere della Sera and member of the editorial board of
Web Scraping & Data Management
This short course introduces students to acquiring and processing data from the Internet in the R statistical programming language. Students will learn core principles and a general-purpose toolkit for accessing web data in many forms. These include major content formats on the free and open web, application programming interfaces (APIs), and both static and dynamically updating pages. We will also cover best practices and procedures for archiving, cleaning, and storing data taken from the web. Students can expect to leave with an understanding of the challenges and risks in acquiring and processing Internet data. They will gain experience scraping and accessing APIs on their own, and they will leave with a toolkit of code and reference material they can use to continue to build their skills and keep abreast of future changes.
Homework will be provided to support the classroom activities. Students are strongly encouraged to make their best attempt to complete the daily homework assignments as these will be launching points for the next day's class discussions and will confront them with real-life problems they will encounter when scraping the open web.
Trainer // Matt Loftis
Matt Loftis is an Assistant Professor of Political Science at Aarhus University in Aarhus, Denmark. He got his PhD from Rice University in Houston, Texas. His substantive research interests are in governance, bureaucracy, and party politics, but his work all involves making use of the vast amounts of government and social media data freely available on the Internet to build large original data sets.
Automated Content Analysis
The workshop introduces to fundamentals in text mining with the R programming language. In short theoretical lectures and extended hands-on tutorials it covers concepts and application of linguistic preprocessing and lexicometric analyses. For preprocessing, we introduce to import of textual data, tokenization, sentence segmentation and part-of-speech tagging. For actual content analysis, we focus on basic methods such as frequency analysis, key term extraction and co-occurrence analysis.
For the hands-on part, we rely on scripts in the programming language R. Thus, we strongly recommend some basic knowledge of R, to successfully take part in the tutorial sessions of the course. For participants without such knowledge, we strongly recommend to take part in the “R Training Course” on Monday prior to our workshop, or to acquire some knowledge of R by themselves (e.g. by using the swirl package,
Trainer // Gregor Wiedemann
Dr. Gregor Wiedemann works in the research groups Language Technology and Human-Centered Computing at the University of Hamburg. He studied political science and computer science in Leipzig and Miami. In 2016, he received his doctoral degree in computer science for his dissertation “Integrating Text Mining into Qualitative Data Analysis for Social Sciences”. Wiedemann has worked in several projects in the fields of digital humanities and computational social science where he developed methods and workflows to analyze large text collections.

Trainer // Andreas Niekler
Andreas Niekler, Dr. Ing., born 1979, has been a research assistant at the Department of Natural Language Processing of the University of Leipzig since 2009. He develops computer-aided methods for social-scientific content analyzes. He contributed to the research project "Post-Democracy and Neoliberalism" and to the interactive analysis platform Leipzig Corpus Miner (LCM). His work emphasises on methods of machine learning and data management. Prior to this, he taught in the Faculty of Media at the Leipzig University of Applied Sciences (HTWK) as well as at the Leipzig School of Media (LSoM) with a focus on media-based data management.
Topic Modeling
This training course will be held by Wouter van Atteveldt Vrije (Universiteit Amsterdam, see full Profile). Further information including a full course description will be provided soon.
Trainer // Wouter van Atteveldt
Wouter van Atteveldt is assistant professor at the VU University Amsterdam and specialized in text analysis methods. After his dissertation on Semantic Network Analysis (2008) he received an NWO VENI grant to automatically extract politicians’ quotes and paraphrases from news items and model their role in political discourse. He has developed the Amsterdam Content Analysis Toolkit (AmCAT) and (co-)developed a number of R packages and given numerous (R) workshops on text analysis. His current methodological interests include grammatical analysis, topic models, sentiment analysis and network methods, and substantively he is interested in long-term changes in journalistic norms.

How to apply

We offer 50 places for young scholars (postdoctoral researchers and doctoral students) from empirical disciplines interested in the study of communication including but not limited to communication science, (media) psychology, sociology, and economics. Participants receive a full scholarship covering travel and accommodation costs. Participants may choose to apply for the whole 5-day program (R-Workshop, CSS method training course, and workshop conference) or for the 4-day layout (CSS method training course and workshop conference).

In order to apply, scholars are asked to outline a (prospective) research question in their area of expertise and discuss how CCS methods can add an innovative advantage to this question, how it advances this area of communication research theorizing and what, if any, potential CSS methods have for the applicant’s academic profile.