Loïc Cudennec

Loïc Cudennec

Research Engineer at CEA LIST, working on High-Performance Embedded Computing. I received both my Engineering Degree and Master Thesis in 2005 at INSA de Rennes. I defended my Ph.D. Thesis in 2009 at the University of Rennes 1, co-funded by the INRIA Centre Bretagne Atlantique, Sun Microsystems and the Regional council of Brittany.

HPC for Embedded Systems: Many-cores architectures, Compilers for dataflow programming languages, Parallelism compilation and reduction (Kalray MPPA), Cache coherency protocols. Grid computing: CoRDAGe, co-deployment and re-deployment of grid applications (project leader), Data consistency models and protocols, Peer-to-peer distributed systems (P2P), Sun Microsystem's P2P framework JXTA, Distributed Shared Memory (DSM), Grid'5000 platform.


Job offer

  1. Langue Française
    Analyse de modèles d'accès aux données partagées pour le choix de protocoles de cohérence pour architectures massivement parallèles.
    Stage de Master disponible pour 2015 au CEA, Centre de Saclay. [www]
    Afin de répondre aux demandes de puissance de calcul tout en maîtrisant la consommation d'énergie, des processeurs à plusieurs centaines de cœurs - dits many-coeurs - apparaissent dans les systèmes de calcul haute performance et les systèmes embarqués. Ainsi nous avons vu apparaître sur le marché, la puce Xeon Phi d'Intel qui contient 64 cœurs ou encore la puce MPPA de Kalray est composée de 256 cœurs. Les performances des applications exécutées sur de telles architectures dépendent fortement de la programmabilité de la puce, et notamment des mécanismes et des techniques de gestion de la mémoire sur puce. Le choix du modèle de cohérence des données (les règles qui déterminent la fraîcheur d'une donnée lors d'un accès) et des protocoles de cohérence impacte directement l'efficacité des accès. Ceci peut rapidement constituer un goulot d'étranglement pour l'application si ces choix sont mal adaptés. Nous avons proposé dans nos précédents travaux une chaîne de compilation à protocoles multiples permettant d'associer à chaque donnée partagée un protocole donné. Dans ce contexte, un axe de recherche ouvert consiste à s'appuyer sur des techniques d'analyse statique du code source afin d'identifier et de caractériser les accès aux données partagées (ex: Identification des tâches de l'application et des variables partagées, construction des graphes de dépendance des accès aux données partagées). Le travail proposé dans ce stage consiste à dresser un état de l'art autour de l'analyse des accès aux données partagées dans les codes d'applications parallèles, puis de proposer une solution qui répond aux points suivants: 1. La possibilité d'extraire, à partir du code source, des graphes représentant les dépendances des accès aux données partagées, 2. La caractérisation de motifs et schémas d'accès mémoire qui s'appuie sur la représentation précédente, 3. Une analyse automatisée des graphes de dépendance afin de détecter les schémas d'accès aux données. Enfin, la solution proposée devra faire l'objet d'un prototype pour la validation du stage. Le stage se déroulera dans les locaux Nano-Innov du CEA Saclay, au sein d'un laboratoire de recherche notamment spécialisé dans l'analyse de code, les chaînes de compilation et les systèmes embarqués pour architectures massivement parallèles.
  2. Langue Française
    Analyse des chemins d'accès aux variables partagées pour l'anticipation en ligne des placements routage.
    Stage de Master disponible pour 2015 au CEA, Centre de Saclay. [www]
    De nos jours, des processeurs à plusieurs centaines de cœurs tels que la puce Xeon Phi d'Intel (64 cœurs) ou la puce MPPA de Kalray (256 cœurs) promettent un gain important de performance ainsi qu'une réduction de la consommation. Ces gains dépendent fortement de la programmabilité des puces mais aussi des stratégies de placement routage lors du déploiement de l'application. L'un des moyens d'obtenir la performance d'exécution consiste à réduire le temps d'accès aux mémoires partagées. Pour cela il suffit de réduire les messages coûteux du protocole d'accès à la mémoire partagée (ex. messages bloquants dans la programmation synchrone ou messages de contrôle, etc.) tout en plaçant les données au plus proche de la tâche. Dans le laboratoire LaSTRE, des travaux antérieurs ont porté sur le développement d'une chaîne de compilation à protocoles multiples permettant d'associer à chaque donnée partagée un protocole donné. L'assignation d'un protocole se fait de manière statique ou dynamique. Dans le contexte de ce stage, nous voudrions doter cette chaîne de compilation d'outils d'analyse permettant de caractériser les chemins d'accès à la mémoire partagée afin d'anticiper les placements routage. Le stagiaire aura à identifier et à formaliser les types d'accès à la mémoire partagée afin d'aider à identifier la meilleur stratégie de placement routage. Le travail se déroulera selon trois phases : 1. Etat de l'art sur les stratégies de placement routage, 2. Formalisation des analyses de dépendance qui permettraient de faciliter les placements routage, 3. Implémentation d'un prototype d'analyse. Le stage se déroulera dans les locaux Nano-Innov du CEA Saclay, au sein d'un laboratoire de recherche notamment spécialisé dans l'analyse de code, les chaînes de compilation et les systèmes embarqués pour architectures massivement parallèles.

Activities

  1. English language
    Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY 2015).
    Loïc Cudennec and Stéphane Louise. Held in conjunction with the International Conference on Computational Science (ICCS 2015), Reykjavik, Iceland, June 2015. Workshop organizer. [www]
    Massively parallel processors have entered high performance computing architectures, as well as embedded systems. In June 2014, the TOP500 number one system (Tianhe-2) features the 57-core Intel Xeon Phi processor. The increase of the number of cores on a chip is expected to rise in the next years, as shown by the ITRS trends: other examples include the Kalray MPPA 256-core chip, the 63-core Tilera GX processor and even the crowd-funded 64-core Parallella Epiphany chip. In this context, developers of parallel applications, including heavy simulations and scientific calculations will undoubtedly have to cope with many-core processors at the early design steps. In the two past sessions of the Alchemy workshop, held together with the ICCS meeting, we have presented significant contributions on the design of many-core processors, both in the hardware and the software programming environment sides, as well as some industrial-grade application case studies. In this 2015 session, we seek academic and industrial works that contribute to the design and the programmability of many-core processors.
  2. Langue Française
    Exploitation de motifs d'accès mémoire et amélioration de la coopération des caches dans les architectures many-coeurs.
    Loïc Cudennec and Safae Dahmani. Journée Méthodes, Outils et Architectures pour les Mémoires (GDR SOCSIP), Lip6, Paris, France, November 2012. Talk. [www,pdf,pdf]
    Les architectures many-coeurs sont composées de plusieurs centaines, voire plusieurs milliers de coeurs regroupés sur une même puce et connectés par un réseau sur puce. Dans ce contexte, certaines techniques classiques utilisées dans les multi-processeurs pour assurer la cohérence des caches se retrouvent limitées par le passage à l'échelle. Dans cette présentation nous introduisons deux contributions permettant d'améliorer le comportement du protocole de cohérence des caches face au nombre croissant de coeurs de calcul. La première repose sur l'exploitation de motifs d'accès en mémoire afin d'anticiper les prochains accès aux données. La deuxième propose l'adaptation d'un protocole de cache coopératif afin d'améliorer son comportement dans un environnement de forte sollicitation.

List of publications

Patents

  1. Langue Française
    Multi-core System and Method of Data Consistency.
    Loïc Cudennec, Jussara Marandola, Jean-Thomas Acquaviva and Jean-Sylvain Camier. FR2970794 (A1), CEA, January 2011. [www]
    The subject of the invention is a system comprising a plurality of cores and a communication bus (302) allowing the cores to communicate with one another, a core being composed of a processor (303) and of at least one cache memory area (304). At least one core comprises a table of patterns (306) in which are stored a set of patterns, a pattern corresponding to a series of memory addresses associated with a digital data item composed of binary words stored at these addresses. This core furthermore comprises means for mapping one of the memory addresses AdB of a digital data item to a pattern associated therewith when said core needs to access this data item as well as means for transmitting a unique message for access to a digital data item located in the cache memory of at least one other core of the system, said message including the memory addresses making up the pattern of the data item sought.

International conferences

  1. English language
    Using the Spring Physical Model to Extend a Cooperative Caching Protocol for Many-Core Processors.
    Safae Dahmani, Loïc Cudennec, Stéphane Louise and Guy Gogniat. Proceedings of the IEEE 8th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-14), Aizu-Wakamatsu, Japan, September 2014. To appear. [www]
    As the number of embedded cores grows up, the off-chip memory wall becomes an overwhelming bottleneck. As a consequence, it is more and more prevalent to efficiently exploit on-chip data storage. In a previous work, we proposed a data sliding mechanism that allows to store data onto our closest neighborhood, even under heavy stress loads. However, each cache block is allowed to migrate only one time to a neighbor's cache (e.g. 1-Chance Forwarding). In this paper, we propose an extension of our mechanism in order to expand the cooperative caching area. Our work is based on an adaptive physical model, where each cache block is considered as a mass connected to a spring. This technique constrains data migration according to the spring constant and the difference of workloads between cores. This adaptive data sliding approach leads to a balanced spread of data on the chip and therefore improves on-chip storage. On-chip data access has been evaluated using an analytical approach. Results show that the extended data sliding increases the global cache hit rate on the chip, especially in the context of juxtaposed hot spots.
  2. English language
    Generating Code and Memory Buffers to Reorganize Data on Many-core Architectures.
    Loïc Cudennec, Paul Dubrulle, François Galea, Thierry Goubier and Renaud Sirdey. Proceedings of the Second International Workshop on Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY 2014), Held in conjunction with the International Conference on Computational Science (ICCS 2014), Cairns, Australia, June 2014. [www]
    The dataflow programming model has shown to be a relevant approach to efficiently run massively parallel applications over many-core architectures. In this model, some particular builtin agents are in charge of data reorganizations between user agents. Such agents can Split, Join and Duplicate data onto their communication ports. They are widely used in signal processing for example. These system agents, and their associated implementations, are of major importance when it comes to performances, because they can stand on the critical path (think about Amdhal's law). Furthermore, a particular data reorganization can be expressed by the developer in several ways, that may lead to inefficient solutions (mostly unneeded data copies and transfers). In this paper, we propose several strategies to manage data reorganization at compile time, with a focus on indexed accesses to shared buffers to avoid data copies. These strategies are complementary: they ensure correctness for each system agent configuration, as well as performance when possible. They have been implemented within the Sigma-C industry-grade compilation toolchain and evaluated over the Kalray MPPA 256-core processor.
  3. English language
    Adaptive Cooperative Caching for Many-cores systems.
    Safae Dahmani, Loïc Cudennec and Guy Gogniat. Proceedings of the Ninth International Summer School on Advanced Computer Architecture and Compilation for High-Performance and Embedded Systems (HiPEAC ACACES 2013), Fiuggi, Italy, July 2013. [www]
    Nowadays, many-core processors emerge as a serious alternative to regular processors by offering high computing performances while controlling power consumption. As for every complex parallel and distributed systems, data sharing and data consistency is of major importance. In this paper we propose a short overview of one particular coherency protocol, the data sliding mechanism, based on cooperative caching. We also present some perspectives to this work, and its global integration within a complete system.
  4. English language
    Throughput constrained parallelism reduction in cyclo-static dataflow applications.
    Sergiu Carpov, Loïc Cudennec and Renaud Sirdey. Proceedings of the 10th International Conference on Computational Science (ICCS 2013), Barcelona, Spain, June 2013. [www]
    This paper deals with semantics-preserving parallelism reduction methods for cyclo-static dataflow applications. Parallelism reduction is the process of equivalent actors fusioning. The principal objectives of parallelism reduction are to decrease the memory footprint of an application and to increase its execution performance. We focus on parallelism reduction methodologies constrained by application throughput. A generic parallelism reduction methodology is introduced. Experimental results are provided for asserting the performance of the proposed method.
  5. English language
    Extended Cyclostatic Dataflow Program Compilation and Execution for an Integrated Manycore Processor.
    Pascal Aubry, Pierre-Edouard Beaucamps, Frédéric Blanc, Bruno Bodin, Sergiu Carpov, Loïc Cudennec, Vincent David, Philippe Dore, Paul Dubrulle, Benoît Dupont de Dinechin, François Galea, Thierry Goubier, Michel Harrand, Samuel Jones, Jean-Denis Lesage, Stéphane Louise, Nicolas Morey Chaisemartin, Thanh Hai Nguyen, Xavier Raynaud and Renaud Sirdey. Proceedings of the First International Workshop on Architecture, Languages, Compilation and Hardware support for Emerging ManYcore systems (ALCHEMY 2013), Held in conjunction with the International Conference on Computational Science (ICCS 2013), Barcelona, Spain, June 2013. [www]
    The ever-growing number of cores in embedded chips emphasizes more than ever the complexity inherent to parallel programming. To solve these programmability issues, there is a renewed interest in the dataflow paradigm. In this context, we present a compilation toolchain for the Sigma-C language, which allows the hierarchical construction of stream applications and automatic mapping of this application to an embedded manycore target. As a demonstration of this toolchain, we present an implementation of a H.264 encoder and evaluate its performance on the embedded manycore chip MPPA.
  6. English language
    Introducing a Data Sliding Mechanism for Cooperative Caching in Manycore Architectures.
    Safae Dahmani, Loïc Cudennec and Guy Gogniat. Proceedings of the 18th International Workshop on High-Level Parallel Programming Models and Supportive Environments (HIPS 2013), Held in conjunction with IPDPS, Boston, Massachusetts, USA, May 2013. [www]
    In this paper we propose a new cooperative caching method that improves the cache miss rate in manycore micro-architectures. The work is motivated by some limitations of recent adaptive cooperative caching proposals. Elastic Cooperative caching (ECC), is a dynamic memory partitioning mechanism that allows sharing cache across cooperative nodes according to the application behavior. However, it is mainly limited with cache eviction rate in case of highly stressed neighborhood. Another system, the adaptive Set-Granular Cooperative Caching (ASCC), is based on finer set-based mechanisms for a better adaptability. However, heavy localized cache loads are not efficiently managed. In such a context, we propose a cooperative caching strategy that consists in sliding data through direct neighbors. When a cache receives a storing request of a neighbor's private block, it spills the least recently used private data to a close neighbor. Thus, solicited saturated nodes slide local blocks to their respective neighbors to always provide free cache space. We also propose a new Priority-based Data Replacement policy to decide efficiently which blocks should be spilled, and a new mechanism to choose host destination called Best Neighbor selector. The first analytic performance evaluation shows that the proposed cache management policies reduce by half the average global communication rate. As frequent accesses are focused in the neighboring zones, it efficiently improves on-Chip traffic. Finally, our evaluation shows that cache miss rate is enhanced: each tile keeps the most frequently accessed data 1-Hop close to it, instead of ejecting them Off-Chip. This notably reduces the cache miss rate in case of high solicitation of the cooperative zone, as it is shown in the performed experiments.
  7. English language
    Enhancing Cache Coherent Architecture With Access Patterns for Embedded Manycore Systems.
    Jussara Marandola, Stéphane Louise, Loïc Cudennec, Jean-Thomas Acquaviva and David Bader. Proceedings of the International Symposium on System-on-Chip (SoC 2012), Tampere, Finland, October 2012. [www]
    One of the key challenges in advanced micro-architecture is to provide high performance hardware-components that work as application accelerators. In this paper, we present a Cache Coherent Architecture that optimizes memory accesses to patterns using both a hardware component and a software API (specialized instructions). The high performance hardware-component in our context is aimed at CMP (Chip Multi-Processing) and MPSoC (Multiprocessor System-on-Chip). A large number of applications targeted at embedded systems are known to read and write data in memory following regular memory access patterns. In our approach, memory access patterns are fed to a specific hardware accelerator that can be used to optimize cache consistency mechanisms by prefetching data and, thus reducing the number of memory transactions. In this paper, we propose to analyze this component and its associated protocol that enhance a cache coherency protocol to perform speculative requests when access patterns are detected. The main contributions are the description of the system architecture providing the high-level overview of a specialized hardware component and the associated transaction message model. We also provided a first evaluation of our proposal, using code instrumentation of a parallel application.
  8. English language
    Parallelism Reduction Based on Pattern Substitution in Dataflow Oriented Programming Languages.
    Loïc Cudennec and Renaud Sirdey. Proceedings of the 9th International Conference on Computational Science (ICCS 2012), p.146-155, Omaha, Nebraska, USA, June 2012. [www]
    In this paper, we present a compiler extension for applications targeting high performance embedded systems. It analyzes the graph of a dataflow application in order to adapt its parallelism degree. Our approach consists in the detection and the substitution of built-in patterns in the dataflow. Modifications applied on the graph do not alter the semantic of the application. A parallelism reduction engine is also described to perform an exhaustive search of the best reduction. Our proposition has been implemented within an industry-grade compiler for the Sigma-C dataflow language. It shows that for dataflow applications, the parallelism reduction extension helps the user focus on the algorithm by hiding all parallelism tuning considerations. Experimentations demonstrate the accuracy and the performance of the reduction engine for both synthetic and real applications.
  9. English language
    Co-Designed Cache Coherency Architecture for Embedded Multicore Systems.
    Jussara Marandola and Loïc Cudennec. Proceedings of the IP-Embedded System Conference and Exhibition (IP-SoC 2011), Grenoble, France, December 2011. [www]
    One of the key challenges in chip multi-processing is to provide a programming model that manages cache coherency in a transparent and efficient way. A large number of applications designed for embedded systems are known to read and write data following memory access patterns. Memory access patterns can be used to optimize cache consistency by prefetching data and reducing the number of memory transactions. In this paper, we present the round-robin method applied to baseline coherency protocol and initial analysis of one hybrid protocol that performs speculative requests when access patterns are detected. We also propose to manage patterns through a dedicated hardware component attached to each core of the processor.
  10. English language
    Building Hierarchical Grid Storage Using the Gfarm Global File System and the JuxMem Grid Data-Sharing Service.
    Gabriel Antoniu, Loïc Cudennec, Majd Ghareeb and Osamu Tatebe. Proceedings of the 14th International Euro-Par Conference (Euro-Par 2008), p.456-465, Las Palmas de Gran Canaria, Spain, August 2008. [www]
    As more and more large-scale applications need to generate and process very large volumes of data, the need for adequate storage facilities is growing. It becomes crucial to efficiently and reliably store and retrieve large sets of data that may be shared at the global scale. Based on previous systems for global data sharing (global file systems, grid data-sharing services), this paper proposes a hierarchical approach for grid storage, which combines the access efficiency of RAM storage with the scalability and persistence of the global file system approach. Our proposal has been validated through a prototype that couples the Gfarm file system with the JuxMem data-sharing service. Experiments on the Grid'5000 testbed confirm the advantages of our approach.
  11. English language
    CoRDAGe: towards transparent management of interactions between applications and ressources.
    Loïc Cudennec, Gabriel Antoniu and Luc Bougé. Proceedings of the International Workshop on Scalable Tools for High-End Computing (STHEC 2008), p.13-24, Kos, Greece, June 2008. [www]
    Nowadays large-scale, grid-aware applications are intended to run for days or even weeks over hundreds or thousands of nodes. This requires new, and often painful operations for the user in charge of deployment and monitoring. We claim that the applications should themselves manage their run in an autonomic way, by requesting new resources on-demand. In this paper, we introduce CoRDAGe, a third-party tool, standing between applications and lower-level grid management tools. It provides generic and application-specific facilities to dynamically expand and retract the deployment of a grid-aware application according to its actual needs. A prototype has been implemented and a preliminary testing has been conducted on the Grid'5000 testbed.
  12. Langue Française
    Vers la classification darwinienne d'un processeur fossile.
    Xavier Le Guillou and Loïc Cudennec. Proceedings of the The Third Review of April Fool's day Transactions (RAFT'2008), p.9-21, Grenoble, France, Avril 2008. [www]
    Évolutionnistes et créationnistes s'affrontent sur tous les plans afin d'imposer à l'ensemble de la communauté leurs idées quant à la disparition d'anciennes espèces. Le domaine de la recherche en informatique et plus particulièrement de la paléoprocessologie est d'autant plus sensible à ce débat que l'extension des laboratoires sur les campus révèle la présence d'un grand nombre de fossiles encore non identifiés. Cet article, véritable étude de cas, présente une approche expérimentale protocolaire visant à la classification d'un processum sorórem fossilis non identifié.
  13. Langue Française
    Un service hiérarchique distribué de partage de données pour grille.
    Loïc Cudennec. Proceedings of the The Rencontres francophones du Parallélisme (RenPar '18 ), Fribourg, Suisse, Février 2008. [www]
    Les besoins grandissants en terme d'espace de stockage requis par les applications scientifiques (en énergie atomique, cosmologie, génétique, etc) motivent la recherche de solutions adaptées pour la gestion des données à grande échelle. Il devient primordial d'offrir un accès efficace et fiable à de grandes quantités de données partagées. Nous pensons que, contrairement à l'approche actuelle basée sur le transfert explicite des données, l'utilisation d'un modèle d'accès transparent aux données permet de simplifier le modèle de programmation. Une telle approche est illustrée à travers les notions de système de fichiers distribué à l'échelle globale et de service de partage de données à l'échelle de la grille. Ce papier présente un système de stockage hiérarchique pour grille, tirant partie de la rapidité des accès en mémoire physique et de la persistance d'un stockage sur disque. Notre proposition a été validée par un prototype couplant le système de fichiers Gfarm avec le service de partage de données JuxMem.
  14. English language
    Building a DBMS on top of the JuxMem Grid Data-Sharing Service.
    Abdullah Almousa Almaksour, Gabriel Antoniu, Luc Bougé, Loïc Cudennec and Stéphane Gançarski. Proceedings of the HiPerGRID Workshop (HiPerGRID 2007), Brasov, Romania, September 2007. [www]
    We claim that building a distributed DBMS on top of a general-purpose grid data-sharing service is a natural extension of previous approaches based on the distributed shared memory paradigm. The approach we propose consists in providing the DBMS with a transparent, persistent and fault-tolerant access to the stored data, within a unstable, volatile and dynamic environment. The DBMS is thus alleviated from any concern regarding the dynamic behavior of the underlying nodes. We report on a feasibility study carried out with our JuxMem grid data-sharing service built on top of the JXTA peer-to-peer platform.
  15. English language
    Performance scalability of the JXTA P2P framework.
    Gabriel Antoniu, Loïc Cudennec, Mathieu Jan and Mike Duigou. Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS 2007), p.108, Long Beach, California, USA, March 2007. [www]
    Features of the P2P model, such as scalability and volatility tolerance, have motivated its use in distributed systems. Several generic P2P libraries have been proposed for building distributed applications. However, very few experimental evaluations of these frameworks have been conducted, especially at large scales. Such experimental analyses are important, since they can help system designers to optimize P2P protocols and better understand the benefits of the P2P model. This is particularly important when the P2P model is applied to special use cases, such as grid computing. This paper focuses on the scalability of two main protocols proposed by the JXTA P2P platform. First, we provide a detailed description of the underlying mechanisms used by JXTA to manage its overlay and propagate messages over it: the rendezvous protocol. Second, we describe the discovery protocol used to find resources inside a JXTA network. We then report a detailed, large-scale, multi-site experimental evaluation of these protocols, using the nine clusters of the French Grid'5000 testbed.
  16. English language
    A practical evaluation of a data consistency protocol for efficient visualization in grid applications.
    Gabriel Antoniu, Loïc Cudennec, Sébastien Monnet. Proceedings of the International Workshop on High-Performance Data Management in Grid Environment (HPDGrid 2006), Rio de Janeiro, Brazil, July 2006. [www]
    Data visualization is important in the context of grid applications, especially when successive refinements are iteratively realized based on intermediate results. We mainly focus on code coupling grid applications, structured as a set of distributed, autonomous, weakly-coupled codes. We consider the case where the codes are able to interact using the abstraction of a shared data space. In previous work, we have proposed an efficient visualization scheme by introducing a new operation called relaxed read, as an extension to the entry consistency model. This operation can efficiently take place without locking, in parallel with write operations. On the other hand, the user has to relax the consistency constraints, and accept slightly older versions of the data, whose ''freshness'' can however still be controlled. In this paper, we discuss and extensively evaluate the proposed consistency protocol, whose efficiency is clearly demonstrated by our experimental results.
  17. English language
    Extending the entry consistency model to enable efficient visualization for code-coupling grid applications.
    Gabriel Antoniu, Loïc Cudennec, Sébastien Monnet. Proceedings of the 6th IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGrid 2006), p.552-555, Singapore, Singapore, May 2006. [www]
    This paper addresses the problem of efficient visualization of shared data within code coupling grid applications. These applications are structured as a set of distributed, autonomous, weakly-coupled codes. We focus on the case where the codes are able to interact using the abstraction of a shared data space. We propose an efficient visualization scheme by adapting the mechanisms used to maintain the data consistency. We introduce a new operation called relaxed read, as an extension to the entry consistency model. This operation can efficiently take place without locking, in parallel with write operations. We discuss the benefits and the constraints of the proposed approach.
  18. Langue Française
    Extension du modèle de cohérence à l'entrée pour la visualisation dans les applications de couplage de codes sur grilles.
    Loïc Cudennec and Sébastien Monnet. Actes des Journées francophones sur la Cohérence des Données en Univers Réparti (CDUR 2005), Paris, France, Novembre 2005. [www]
    Ce papier s'intéresse au problème de la visualisation des données partagées dans les applications à base de couplage de codes sur les grilles. Nous proposons d'améliorer l'efficacité de la visualisation en intervenant sur les mécanismes de gestion des données répliquées et plus particulièrement au niveau du protocole de cohérence. La notion de lecture relâchée est alors introduite comme une extension du modèle de cohérence à l'entrée (entry consistency). Ce nouveau type d'opération peut être réalisé sans prise de verrou, en parallèle avec des écritures. En revanche, l'utilisateur relâche les contraintes sur la fraîcheur de la donnée et accepte de lire des versions légèrement anciennes, dont le retard est néanmoins contrôlé. L'implémentation de cette approche au sein du service de partage de données pour grilles JuxMem montre des gains considérables par rapport à une implémentation classique basée sur des lectures avec prise de verrou.

Ph.D. Thesis

  1. Langue Française
    CoRDAGe : Un service générique de co-déploiement et redéploiement d'applications sur grilles.
    Loïc Cudennec. Thèse de Doctorat. Université de Rennes I, INRIA Centre Bretagne Atlantique, IRISA, Janvier 2009. [www]
    La mutualisation des ressources physiques réparties dans les universités, les instituts et les entreprises a permis l'émergence des grilles de calcul. Ces infrastructures dynamiques sont bien adaptées aux applications scientifiques ayant de grands besoins en puissance de calcul et en espace de stockage. L'un des défis majeur pour les grilles de calcul reste la simplification de leur utilisation. Contrairement au déploiement d'applications sur une infrastructure centralisée, le déploiement sur une grille nécessite de nombreuses tâches pénibles pour l'utilisateur. La sélection des ressources, le transfert des programmes ainsi que la surveillance de l'exécution sont en effet laissés à sa charge. Aujourd'hui, de nombreux travaux proposent d'automatiser ces étapes dans des cas simples. En revanche très peu permettent de prendre en charge des déploiements plus complexes, comme par exemple le redéploiement d'une partie de l'application pendant son exécution ou encore le déploiement coordonné de plusieurs applications. Dans cette thèse, nous proposons un modèle pour prendre en charge le déploiement dynamique des applications sur les grilles de calcul. Ce modèle vise à offrir deux fonctionnalités principales. La première consiste en la traduction d'actions de haut niveau, spécifiques aux applications, en opérations de bas niveau, relatives à la gestion des ressources sur la grille. La deuxième consiste en la pré-planification des déploiements, redéploiements et codéploiements d'applications sur les ressources physiques. Le modèle satisfait trois propriétés. Il rend transparent la gestion des ressources à l'utilisateur. Il offre des actions spécifiques aux besoins de l'application. Enfin, il est non-intrusif en limitant les contraintes sur le modèle de programmation de l'application. Une proposition d'architecture nommée CORDAGE vient illustrer ce modèle pour le co-déploiement et le redéploiement d'applications. CORDAGE a été développé en lien avec l'outil de réservation OAR et l'outil de déploiement ADAGE. La validation du prototype s'est effectuée avec la plate-forme pair-à-pair JXTA, le service de partage de données JUXMEM ainsi que le système de fichiers distribué GFARM. Notre approche a été évaluée sur la grille expérimentale GRID' 5000.
  2. English language
    Experimentations With CoRDAGe, A Generic Service For Co-Deploying and Re-Deploying Applications On Grids.
    Loïc Cudennec, Gabriel Antoniu and Luc Bougé. INRIA Research Report, RR-7086, INRIA Centre Bretagne Atlantique, February 2009. [www]
    Computer grids are made of thousands of heterogeneous physical resources that belong to different administration domains. This makes the use of the grid very complex. In this paper, we focus on deploying distributed applications at a large scale. As the application requirements may often not be anticipated, dynamic re-deployment is needed; if various applications have to co-operate within a workflow, they should also be co-deployed in a consistent way. In a previous paper, we have described the CORDAGE deployment model and its architecture. It meets the three properties of transparency, versatility, and neutrality. We report in this paper on its application to a real co-deployment over the GRID'5000 experimental platform, using different configurations, including multiple clients, multiple applications and multiple grid sites.

Master Thesis

  1. Langue Française
    Modèles et protocoles de cohérence des données en environnement volatile.
    Loïc Cudennec. Rapport de fin d'études. INSA de Rennes, INRIA Centre Bretagne Atlantique, IRISA, Juin 2005. [www]
    Ce rapport s'intéresse au problème de la visualisation des données partagées dans les applications à base de couplage de codes sur les grilles. Nous proposons d'améliorer l'efficacité de la visualisation en intervenant sur les mécanismes de gestion des données répliquées et plus particulièrement au niveau du protocole de cohérence. La notion de lecture relâchée est alors introduite comme une extension du modèle de cohérence à l'entrée (entry consistency). Ce nouveau type d'opération peut être réalisé sans prise de verrou, en parallèle avec des écritures. En revanche, l'utilisateur relâche les contraintes sur la fraîcheur de la donnée et accepte de lire des versions légèrement anciennes, dont le retard est néanmoins contrôlé. L'implémentation de cette approche au sein du service de partage de données pour grilles JuxMem montre des gains considérables par rapport à une implémentation classique basée sur des lectures avec prise de verrou.
  2. Langue Française
    Gestion de la cohérence des données dans les systèmes distribués.
    Loïc Cudennec. Étude bibliographique. INSA de Rennes, INRIA Centre Bretagne Atlantique, IRISA, Février 2005. [ftp]
    Ce rapport présente l’état de l’art dans le domaine du partage de données en environnement distribué. Les concepts de mémoire virtuellement partagée, modèle de cohérence, réseau pair-à-pair et tolérance aux fautes y sont développés afin de comprendre les mécanismes pouvant être mis en jeu sur une architecture de type grille. Le service JuxMem de partage de données pour les grilles est introduit comme une synthèse des notions présentées dans ce rapport.

Supervising

Ph.D. Thesis

  1. New data consistency models and protocols for multi-scale architectures, from many-core to smartgrid.
    Safae Dahmani. With Guy Gogniat (Advisor) Ph.D. Thesis, Université de Bretagne Sud / CEA, 2012-2015
    This thesis addresses challenges regarding data consistency within distributed systems and massively distributed systems. We observe a trend towards the convergence of both many-cores and smart grids application domains, at least for some levels of these hierarchical infra-structures. This thesis aims at approaching in a systemic way the problem of data consistency within many-core architectures (hundreds of core shipped together with memory and network on asingle chip) and smart grids (information systems designed to manage power grids). After performing an in-depth survey on data consistency applied to parallel and distributed architectures (multi-core, cluster and grid), as well as to shared data systems (distributed shared systems, peer-to-peer systems, distributed databases) a particular emphasis shall be put to the choice of an appropriate data consistency model that fits the needs of parallel applications, while scaling in good conditions. The proposed data consistency protocol shall be based on protocol and software innovation (down to hardware), or wisely take advantage of previous works in closed research fields. Several directions can be explored: exploiting regular access with application profiling, making a hierarchical protocol relying on group communications, relaxing constraints on data freshness, specializing the network on chip and hardware devices to manage protocol metadata, modifying task placement to decrease protocol messages. Some prototyping work will be carried out so as to demonstrate the relevance of the solution. This will be achieved with synthetic and real applications thanks to a simulation platform.

Master Thesis

  1. Langue Française
    Réalisation d'un modèle de communication pour mémoire sur puce afin d'évaluer le coût d'accès aux donnés dans une plate-forme many-coeurs à caches coopératifs.
    Hamza Chaker. With Safae Dahmani, Guy Gogniat and Martha Johanna Sepulveda. Master Thesis, LabSTICC, Université de Bretagne-Sud / CEA, 2014
  2. Langue Française
    Une nouvelle stratégie de glissement de données pour les caches élastiques dans les architectures many-coeurs.
    Safae Dahmani. Master Thesis, Ecole normale supérieure de Cachan / CEA, 2012 [www]
    La contribution de ce rapport propose une solution à la limitation du mécanismes de cache élastique dans un voisinage stressé. En effet, le mécanisme de glissement de données proposée consiste à diffuser les données de proche en proche sur un voisinage plus élargi. Toutes les données locales issues d'un nœud stressé sont stockées chez son voisinage direct. Les voisins sollicités font de même pour leurs données locales tout en les gardant dans leurs voisinages respectifs. Cette technique permet de maintenir les données locales les plus utilisées à 1 pas proche de leur nœud d'origine, au lieu de les éjecter hors puce. Ce qui revient à réduire le coût d'accès à ces données, mais aussi le nombre de défauts de cache. En plus, du principe de glissement, notre contribution repose sur deux principaux mécanismes : la technique du Best Neighbor pour le choix du voisin destinataire, et le protocole de remplacement par priorité pour désigner le bloc à transférer au voisinage. Une première étude analytique montre une amélioration importante du trafic sur puce. Une réduction des accès au Home Node a également été constatée. D'autres perspectives sont envisagées pour évaluer le gain en latence et coût de consommation.
  3. English language
    A Decoupled Approach to Decentralize the Gfarm Metadata Management.
    André Lage Freitas. With Luc Bougé and Gabriel Antoniu. Master Thesis, University of Rennes 1, 2008 [ftp]
    Current DFS metadata managements have distributed the metadata and workload between servers aiming to prevent potentials bottleneck to the system. Although, these solutions are often based on hard techniques to provide metadata consistency and look up which increases the system complexity and the implementation difficulty. In order to better adapt DFS metadata management to the grid context by simplifying its design and facilitating its implementation, we propose a decoupled metadata management design that separates the metadata storage management from the metadata servers management. This approach was useful to implement a decentralized Gfarm metadata management which leveraged the Grid Datasharing Service (GDS) implemented by JuxMem. Gfarm metadata management used JuxMem to transparently share metadata by providing consistency.
  4. English language
    Scaling Distributed Database Management Systems by using a Grid-based Storage Service.
    Silviu-Marius Moldovan. With Luc Bougé and Gabriel Antoniu. Master Thesis, University of Rennes 1, 2008 [ftp]
    This report deals with databases with distributed storage, focusing on the insufficient storage space issue. In order to make these systems more scalable, the advantages offered by grids can be taken into consideration. Thus, an approach to create an interface between a database system and a grid-based data storage service is presented.
  5. English language
    Enhancing the JuxMem Grid Data Sharing Service with Persistent Storage using the Gfarm Global File System.
    Majd Ghareeb. With Luc Bougé and Gabriel Antoniu. Master Thesis, University of Rennes 1, 2007
  6. English language
    A Grid Database Management System based on DSM and P2P Hybrid Paradigm.
    Abdullah Almaksour. With Luc Bougé and Gabriel Antoniu. Master Thesis, INSA de Lyon, 2007 [www]
    The PARIS team at IRISA works on the design and the implementation of a Grid Data-Sharing Service for the computing grids, called JuxMem. This service provides the illusion of a memory shared above an infrastructure 1) physically distributed on a large scale, and 2) volatile, because of the possible failures, but also of the dynamic resource insertions/removal. The goal of this work is to design and implement peer-to-peer services allowing disseminated data sharing, date storage, data interrogation and handling at a large scale, based on simple distributed database management system schemes. In this work, we have proposed a distributed memory database management system architecture adapted to grid infrastructures, based on the JuxMem infrastructure and on the JuxMem API. Database representation, table fragmentation and table indexing are the main problems which have been addressed in our implementation. This work involves three domains: Distributed Shared memory systems (DSM), Peer-to-peer systems(P2P) and Distributed memory-based database management systems(DMMDB).
  7. English language
    A Monitoring Tool to Manage the Dynamic Resource Requirements of a Grid Data Sharing Service.
    Voichita Almasan. With Luc Bougé and Gabriel Antoniu. Master Thesis, University of Rennes 1, 2006 [ftp]
    JuxMem is an infrastructure for data sharing, to be used by distributed applications which run on top of the grid. These distributed applications can be seen as clients for JuxMem. Consider a situation in which an application needs to allocate space where to store pieces of data it would need later, throughout its execution. When requiring data allocation, JuxMem’s clients can specify more parameters to JuxMem, which would reflect the degree and modality of replication for that piece of data. In the current version, the clients can specify on how many clusters they want their piece of data to be stored and how many copies would be necessary to be kept on each of these clusters. Consequently, JuxMem will attempt to “reserve” as many JuxMem storage units as needed for the size of the data, on the number of nodes specified by the client, for each of the clusters required by the client. Our contribution is to study the ways JuxMem could interact with grid resource management systems and deployment tools.

Software

  1. English language
    CEA Sigma-C / Kalray MPPA AccessCore.
    Industrial software development toolkit. [www]
    The Sigma-C dataflow programming language has been introduced to ease the development of massively parallel and distributed architectures, with a focus on manycore processors. It has been designed by CEA LIST, in collaboration with the Kalray fabless company, and is now part of the MPPA AccessCore development toolkit. In this project I was part of the design and implementation of several steps of the Sigma-C compilation toolchain: parallelism instantiating and checking, parallelism reduction, system agent compilation for data reorganisation and memory access patterns, and large graph visualization. I was also in charge of the toolchain integration.
  2. English language
    CoRDAGe.
    Free Software, LGPL License. [www]
    CoRDAGe is a co-deployment and re-deployment tool for grid applications. It interfaces distributed applications with grid middlewares in charge of node reservation and deployment to bring dynamicity during the whole execution time. CoRDAGe is currently able to (re)deploy the JuxMem data sharing service using the OAR reservation tool and the ADAGE deployment tool. Interactions between grid applications and CoRDAGe are made thanks to the XML-RPC protocol specification. A set of specific actions have to be written in C++ following the CoRDAGe framework, to provide support for your applications.
  3. English language
    TABI.
    Free Software, LGPL License. [www]
    TABI is a Tree Automata Browsing Interface for visual automata inspection. This tool permits to build interactively and graphically some representatives of the language recognized by a tree automaton.
  4. English language
    doodle4gift.
    Free Software, LGPL License. [www]
    Doodle4Gift is a PHP5 application that helps people make a list of wishes and contribute to offer gifts. This is the perfect application to share gift ideas for birthdays, weddings, or to expose your own wish list. Written in PHP5, using a XML database (no SQL).
  5. English language
    media2html.
    Free software, GNU General Public License. [www]
    This program generates a static HTML5/CSS3 website from a regular file system structure, focusing on media files (images, audio files, video files and text files). In each directory, a HTML file is created, embedding an audio-video player and a playlist made of all the compatible media files found at this level. An explorer view is also proposed in order to open and download files, as well as to navigate to the other directories. This program makes use of jPlayer, ExifTool and ImageMagick.

Hobbies

  1. Music (as Logan Dataspirit).
    Electronic music composition and production since the late 90. Featured on more than 40 albums and EPs released on a dozen of labels worldwide.
  2. Sport.
    Volley-ball (14 years), Handball (6 years).