TensorFlowSNA - a master thesis project

Mastering TensorFlow SNA

A quite advanced master thesis research project

About

Quantitative analysis of the evolution of the TensorFlow collaborative social network over time using SAOMs (Stochastic Actor-Oriented Models)
Image of a neural network

Motivation

  • Increase our understanding of collaboration and information sharing in open-source coopetitive software ecosystems, i.e. open-coopetition (see Teixeira 2023).
  • Assess if the evolution of TensorFlow collaborative open-source software ecosystem is a matter of (1) randomness, (2) well-known mechanisms of network evolution, or (3) strategy as executed by the firms that contribute to the software ecosystem.

Key references

Key theoretical references
  • Contractor, N. S., Wasserman, S., & Faust, K. (2006). Testing multitheoretical, multilevel hypotheses about organizational networks: An analytic framework and empirical example. Academy of Management Review, 31(3), 681-703.
    Available at JSTOR https://www.jstor.org/stable/20159236.
  • Teixeira, J. A., Ahmed, S. S., Laine-Kronberg, L., Mezei, J., & Smailhodzic, E. (2025). Towards understanding open and coopetitive platform ecosystems: The case of TensorFlow. in Proceedings of the 33th European Conference on Information Systems (ECIS 2025) AIS (conditionally accepted 28 Feb 2025). Open-access right here.
  • Teixeira, J. A. (2024). Towards understanding open­-coopetition -- Lessons from the automotive industry. in Proceedings of the 44th International Conference on Information Systems (ICIS 2023) AIS. Open-access right here. Also available from AISel https://aisel.aisnet.org/icis2023/isdesign/isdesign/5/ .
  • Li, X., Zhang, Y., Osborne, C., Zhou, M., Jin, Z., & Liu, H. (2025). Systematic literature review of commercial participation in open source software. ACM Transactions on Software Engineering and Methodology, 34(2), 1-31.
    Available at ACM DL https://doi.org/10.1145/3690632.
Key methodological references
Related working papers from others
  • Osborne, C., Daneshyan, F., He, R., Ye, H., Zhang, Y., & Zhou, M. (2024). Characterising Open Source Co-opetition in Company-hosted Open Source Software Projects: The Cases of PyTorch, TensorFlow, and Transformers. arXiv preprint arXiv:2410.18241. Open-access at https://doi.org/10.48550/arXiv.2410.18241 .

Aim and research questions

Overall aim:
  • Advance our understanding on open-source, and open-coopetition by looking at the case of TensorFLow;
Research questions:
  • How does the TensorFlow collaborative social network evolve over time?
  • It is a matter of (1) randomness, (2) well-known mechanisms of network evolution, or (3) behavioural strategy from the firms that contribute to the TensorFlow ecosystem?

Tools

Tools for mining git repositories with SNA
Tools for the visualization of social networks
  • visone is a software for the visual creation, transformation, exploration, analysis and representation of network data, jointly developed at the University of Konstanz and the Karlsruhe Institute of Technology.
  • Tulip is an information visualization framework dedicated to the analysis and visualization of relational data. Tulip aims to provide the developer with a complete library, supporting the design of interactive information visualization applications for relational data that can be tailored to the problems he or she is addressing. Developed by LaBRI, University of Bordeaux, France.
Tools for the statistical analysis of social networks
  • statnet is a suite of open source R-based software packages for network analysis, along with a comprehensive set of training materials. Developed by Pavel Krivitsky, Skye Bender-deMoll, Michał Bojanowski, Carter T. Butts, Steven M. Goodreau, Mark S. Handcock, David R. Hunter, Chad Klumb, and Martina Morris among others.
  • Goldfish is a software tool (i.e. R package) for the analysis of time-​stamped network data using a variety of models. In particular, it implements different types of Dynamic Network Actor Models (DyNAMs), a class of models that is tailored to the study of actor-​oriented network processess through time. Goldfish also implements different versions of tie-​oriented relational event models. Developed by members of the Chair of Social Networks at ETH Zürich and James Hollway at the Graduate Institute in Geneva.
  • RSiena RSiena is a R package designed for the analysis of longitudinal network data using Stochastic Actor-Oriented Models (SAOMs). Developed by Tom A.B. Snijders and his colleagues, RSiena allows researchers to model and understand the dynamics of social networks over time. With RSiena, you can analyze how network ties evolve based on various factors, such as network structure, actor attributes, and external influences. The software is particularly useful for studying the formation and dissolution of ties in social networks, making it a valuable tool for sociologists, organizational researchers, and other social scientists.
  • Relevent. Relational Event Models (REMs) are a powerful tool for analyzing event data in social networks. Unlike traditional network models that focus on static snapshots, REMs are designed to handle continuous streams of interaction events, such as emails, phone calls, or social media posts. The R package for REMs, typically referred to as `relevent`, allows researchers to model the occurrence of relational events based on various factors, including past interactions, network structure, and actor attributes. This makes REMs particularly useful for studying the dynamics of communication and interaction in social networks.

Data

The relational network data is retrieved from the Git repository using ScrapLogGit2Net. The tool, first described in Teixeira et al. (2015), connects developers that co-editing the same source-code file. Here there is the assumption that co-editing the same source file traces some cooperative and/or information sharing behaviour.

The TensorFLow collaborative network during a certain time slice can be formally defined as:
Gt = (V,Av,E)
Where:
V = A set of nodes representing the developers contributing to the TensorFlow core open-source software project
E = A set of edges, identifying the connections between two developers if they have worked on the same software source-code file.
Av = A set of nodes-attributes, capturing each developer’s company affiliation. This information is extracted from the email address of each developer and/or the GitHub API.

Description of the collected data

Metadata and Paradata briefs
Mined repository https://github.com/tensorflow/tensorflow.git
Mining tool https://github.com/jaateixeira/ScrapLogGit2Net
Miner Jose Teixeira
Last collection 18 October 2024
Covered lifespan of the project 7 Nov 2013 - 12 April 2024
Segmentation Year by year - 2013 to 2024
Number of networks 1 capturing the overall project lifespan + 9 capturing a year each
Nodes Individual software developers (bots were filtered out), id by email
Edges Cooperation and information sharing, association by co-editing same source-code file
Node-attributes Organizational affiliation, association by email domain and GitHub API
File/Network format graphml - http://graphml.graphdrawing.org/
Archival of the File/Network/GraphML files https://github.com/jaateixeira/ScrapLogGit2Net/tree/main/test-data/TensorFlow/icis-2024-wp-networks-graphML
Related publications Teixeira, J. A., Ahmed, S. S., Laine-Kronberg, L., Mezei, J., & Smailhodzic, E. (2025). Towards understanding open and coopetitive platform ecosystems: The case of TensorFlow. in Proceedings of the 33th European Conference on Information Systems (ICIS 2023) AIS (conditionally accepted 28 Feb 2025). Open-access right here.

Networks

Network 1 - Capturing the overall project lifespan of the project

Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 4219
Number of edges: 378309
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: 7 Nov 2013 - 12 April 2024
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Network 2 - Capturing code collaboration during 2015

Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 47
Number of edges: 170
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: Year 2015
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Network 3 - Capturing code collaboration during 2016

Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 610
Number of edges: 14368
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: Year 2016
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Network 4 - Capturing code collaboration during 2017

Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 916
Number of edges: 23101
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: Year 2017
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Network 5 - Capturing code collaboration during 2018

Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 923
Number of edges: 24424
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: Year 2018
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Network 6 - Capturing code collaboration during 2019

Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 943
Number of edges: 26531
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: Year 2019
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Network 7 - Capturing code collaboration during 2020

2015
Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 896
Number of edges: 26416
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: Year 2020
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Network 8 - Capturing code collaboration during 2021

Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 728
Number of edges: 19396
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: Year 2021
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Network 9 - Capturing code collaboration during 2022

Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 617
Number of edges: 14934
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: Year 2022
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Network 10 - Capturing code collaboration during 2023

Node classifier: Human software developer
Edge classifier: Collaboration and information sharing
Number of nodes: 588
Number of edges: 15129
Node attributes: e-mail, color, affiliation
Edge attributes: Null
Captured time span: Year 2023
Network data file format: graphml - http://graphml.graphdrawing.org/
Network data available: https://github.com/jaateixeira/ScrapLogGit2Net (here)

Data-set with all the 10 networks: Download tarball here;

Contact

Jose Teixeira < jose.teixeira AT abo.fi >