------------------- Released version 1.3 -----------------------------

Major features:

- Basic support for the K Computer and Fujitsu FX10 systems added. The
  Tofu network topology will be supported in a subsequent release.
  Note that some C++ OpenMP programs fail during measurement
  initialization for unknown reasons.
- Add support for instrumenting programs which use SHMEM library calls
  for one-sided communication. Score-P currently supports the SHMEM
  implementations of Cray, Open MPI, OpenSHMEM, and SGI.
- Basic support for POSIX thread instrumentation. Supported POSIX
  thread routines are pthread_create, pthread_join,
  pthread_mutex_init, pthread_mutex_destroy, pthread_mutex_lock,
  pthread_mutex_trylock, pthread_mutex_unlock, pthread_cond_init,
  pthread_cond_destroy, pthread_cond_signal, pthread_cond_broadcast,
  pthread_cond_wait, and pthread_cond_timedwait. Following thread
  management functions are currently not supported and will abort the
  program: pthread_exit and pthread_cancel. The usage of
  pthread_detach will cause the program to fail if the detached thread
  is still running after the end of main. These limitations will be
  addressed in an upcoming version of Score-P. Note that you need to
  instrument every thread creation.

Features and improvements:

- Use Process Manager Interface (PMI) to get fine-granular information
  about the system topology on Cray machines.
- Implemented the possibility to write CUBE profiles with the tuple
  values containing sum, minimum, maximum, number of samples, sum of
  squares.
- The new SIONlib integration of OTF2 extends the support of writing
  SION traces to all multi-process paradigms, not only MPI. Though
  only pure multi-process measurements are supported for now. No
  threads, no CUDA, no non-CPU metrics. Score-P itself does not depend
  on SIONlib any longer, only OTF2 does now. The configure option
  '--with-sionlib' (formerly '--with-sionconfig') is passed to OTF2.
  As part of this integration the measurement configuration variable
  'SCOREP_TRACING_NLOCATIONS_PER_SION_FILE' was renamed to
  'SCOREP_TRACING_MAX_PROCS_PER_SION_FILE' to clarify that Score-P can
  only distribute whole processes into a multi-file SION trace.
- Improved initialization of adapters which results in a reduced
  number of libraries needed to be linked into the application.
- Extended the TAU adapter to allow input of location properties,
  which are location specific meta data presented as key/value pair.
- The option --thread=<paradigm>[:<variant>] gives users the
  possibility to choose the threading model and to fine-tune certain
  aspects. Currently OpenMP and POSIX threads are supported with
  either --thread=omp or --thread=pthread. For OpenMP we provide the
  two variants --thread=omp:pomp_tpd (default) and
  --thread=omp:ancestry. The former tells OPARI2 to insert code for
  thread tracking where the latter uses the ancestry functions in
  OpenMP 3.0 and later to accomplish the same task.

User tools and API improvements and changes:

- Improved automatic MPI detection in the instrumenter (helpful on
  Cray, as cc/CC/ftn is the compile command for both MPI and non-MPI).
- Changed paradigm selection in the instrumenter to match the
  selection options in the scorep-config tool. Thus, introduced
  --mpp=<paradigm> and --thread=<paradigm> flags for the instrumenter
  to select the multi-process paradigm and the threading paradigm. The
  old options --mpi, --nompi, --openmp, --noopenmp are marked as
  deprecated and are no longer documented.
- Added handling for special characters, like space, in file names and
  path names. However, there are still some limitation when using
  special characters: The PDT parser can not deal with these
  characters and, thus, fails if PDT instrumentation is enabled and
  special characters appear. Furthermore, compilation fails when
  double quotes appear in source file names and preprocessing is
  enabled.
- Unified naming of macros in the user adapter. In C/C++ the macros to
  define global region handles (SCOREP_GLOBAL_REGION_DEFINE and
  SCOREP_GLOBAL_REGION_EXTERNAL) and in Fortran the parameter macros
  (SCOREP_PARAMETER_DEFINE, SCOREP_PARAMETER_INT64,
  SCOREP_PARAMETER_UINT64, SCOREP_PARAMETER_STRING) got the prefix
  SCOREP_USER instead of only SCOREP.
- Added selection for mutex locking, allowing to use the parameter
  --mutex=<locking> to switch between known locking mechanisms within
  the measurement system (omp,pthread,pthread:spinlock,pthread:wrap).
- Improved event size estimation in scorep-score using otf2-estimator.
- Install Cube remap specification file and provide its location via
  the scorep-config tool.
- The scorep-info tool can now show known and open issues regarding
  the measurement with Score-P. It is highly advised to consult this
  list before reporting problems.

CUDA support improvements and changes:

- Added support for CUDA 5.5 and CUDA 6.0: The CUPTI activity buffer
  handling has changed. The SCOREP_CUDA_BUFFER_CHUNK environment
  variable has therefore been introduced (see user documentation). The
  default size for SCOREP_CUDA_BUFFER was changed to '1M'.
- New options for SCOREP_CUDA_ENABLE:
  'references'   : track references between CUDA host and device
                   activities in the OTF2 trace
  'flushatexit'  : forces pending CUDA activities to be flushed at program
                   exit (avoids records to be dropped in OpenACC programs)
  'kernel_serial': serialize recording of (potentially concurrent) kernels
- Obsolete options for SCOREP_CUDA_ENABLE:
  'concurrent'  : recording of (potentially concurrent) kernels is the
                  default
  'stream_reuse': feature has been removed
  'device_reuse': feature has been removed

- Added support for runtime filtering of CUDA device and host
  activities.

Bugfixes:

- When using the Intel compiler, functions from shared libraries now
  appear in the measurement output. Previously we inspected the symbol
  table of the executable and evaluated the filtering on all functions
  in the executable. Thus, compiler instrumented functions from shared
  libraries were automatically filtered, when using the Intel
  compiler. Now, the filters are evaluated when the functions appear
  the first time.
- Fix handling of Intel compiler options starting with "-o".
- The pgCC compiler version 13.9 and newer preinclude omp.h if OpenMP
  is enabled. This leads to multiply defined symbols if the source
  file is preprocessed before compilation. Prevent the preinclusion
  for the compilation of preprocessed files if an appropriate compiler
  option exists (exists since pgCC version 14.1).
- Fix a deadlock on AIX, if MPI_Abort was called.
- If a system provides only shared OpenMP runtime libraries and a
  compiler does not add rpath information but relies on
  LD_LIBRARY_PATH, the Score-P instrumenter fails execution. Fixed.
- Fix missing flags in OPARI2 call to disable OpenMP instrumentation,
  if the user selected POMP instrumentation for a serial program
  without specifying that the program is serial.
- Prepend link calls to the Intel compiler by setting VT_LIB_DIR and
  VT_LIBS to avoid remarks.
- Changed enumeration of threads in the profile from a global
  enumeration to an enumeration from 0 to N-1 on each process.
- Use "-G2" if the Cray compiler instrumentation is used.
  The previous "-g" flag disabled all optimizations.
- Fix creation of experiment directory if the monitored application
  make use of 'chdir' operation.
- The Score-P instrumenter tool moved compiler selection flags for the
  MPI compiler wrapper to a different location in the command
  line. Fixed.
- Fixed broken instrumentation if the applications link step
  explicitly links libc.
- Fixed wrong acquisition order attribute passed to acquire lock
  events from OpenMP critical sections.

------------------- Released version 1.2.3 ---------------------------

- Fixed a failed assertion that occurs if selective recording was
  enabled in profiling mode.
- Fixed wrong path names in the instrumenter, when Score-P was
  configured with the --bindir flag.
- Install scorep-score in the correct directory, if Score-P was
  configured with the --bindir flag.
- Reduce per-event measurement overhead by improving Score-P's assert
  and error handling.
- Adapt configure to recent Cray installations.
- Score-P measurements provided with a SCOREP_EXPERIMENT_DIRECTORY,
  say foo, used to overwrite an existing foo even if this foo is not a
  directory. Will now abort with a meaningful message.
- Metric plugin component: handling of multiple metrics improved.
- Don't remove source files during make distclean in an in-place
  build.
- Fix failing detection of nvcc in case it was called with a path.
- The measurement configuration (stored in the file `scorep.cfg') is
  now also preserved in the experiment directory in case of an failed
  measurement.
- Added compiler instrumentation flags also to the ldflags to fix
  missing instrumentation if high optimization levels recompile parts
  of the code.
- Changed the region names of OPARI2 instrumented named criticals.
  If a name for the critical region is provided, the enclosing region
  will have the name '!$omp critical <name>' and the structured block
  '!$omp critical sblock'. Replace <name> by the given name.

------------------- Released version 1.2.2 ---------------------------

- The Fortran Cray compiler instrumentation did not create an exit
  event. Thus, we add an exit on Score-P finalization.
- Removed remark of the Intel compiler during instrumentation that
  VT_ROOT is not set, if preprocessing was used.
- MPI parallel measurements with just one process were fixed.
- Fixed a race condition during initialization of the
  TRACE_BUFFER_FLUSH region, that could lead to incomplete profiles if
  a user runs a hybrid (MPI + OpenMP) application and enables
  profiling and tracing at the same time.
- Fix error message when scorep-config is called without arguments in
  a non-mpi installation.
- In scorep-config's rpath options, omit paths searched by ldconfig,
  even if Score-P was installed there, in order to comply to packaging
  guidelines of some Linux distributions.
- Fixed broken MPI detection in the instrumenter if the MPI compiler
  wrapper is specified with the full path.
- If Score-P is build with static and dynamic libraries, the selection
  of using static or dynamic libraries was improved. Using -Bstatic or
  -Bshared had some side effects and was sometimes unreliable.
- On Cray system, change libtools default to prefer static linking of
  external libraries.
- Suppress failed assertion messages when initializing compiler
  instrumentation with Intel compilers without libbfd. The measurement
  completes even if these messages exist.
- Added options to scorep-config and the scorep instrumenter to
  enable/disable online access support.
- Fixed broken --includedir configure option that installed Score-P
  headers in a wrong directory.
- Fix SCOREP_RECORDING_IS_ON(isOn) user macro; in Fortran codes, isOn
  was not set to false when instrumented with --nouser.
- Fixed instrumentation compilation error that occurred if
  --opari="--disable=atomic" was specified without OpenMP compilation
  flags.
- Improvements in obtaining region information via libbfd.
- Improved configure checks to determine values of MPI
  constants. Previous tests failed on AIX.
- Improvements of measurement reconfiguration in Online Access mode.
- Honor --without-mpi when --with-custom-compilers is given at
  configure time.
- Several smaller fixes.

------------------- Released version 1.2.1 ---------------------------

- Allow configuration without support for the MPI programming model by
  specifying --without-mpi on the configure line.
- Abort during instrumentation with a meaningful error message if
  a user requests MPI but the Score-P installation does not support MPI
- On Blue Gene/Q, detect PAMI library at configure time. The location
  and names of the PAMI files changes during a system upgrade. Search
  all known directories and library names.
- Improve --with-custom-compilers, customization files are now
  recognized also in the build directory (see INSTALL).
- On SGI MPT systems, or more generally on systems that don't use
  compiler wrappers for building MPI programs, improve the automatic
  detection of the MPI programming paradigm during instrumentation.
- Abort with an error message during instrumentation if the user wants
  to build a shared library with static Score-P libraries.
- Abort if the user specified a filter file which cannot be opened.
- Improved the auto-detection in the instrumenter for MPI libraries. This
  should fix some failures with MPI programs that do not use a compiler
  wrapper, e.g., when using SGI MPT.
- Fixed that the instrumenter fails to detect whether an application
  uses OpenMP with the XL compiler if the user specifies more than one
  option to '-qsmp="
- Abort configuration when the user specified --without-cube on the
  commandline as cube is a required component.

------------------- Released version 1.2 -----------------------------

- Simplified MPI compiler detection, passing '--with-mpi' to configure
  is usually not necessary if your MPI compiler is in PATH.
- Support for Cray systems. PrgEnv-(cray|gnu|intel|pgi) are supported
  in static mode (static is the default). Please note that OpenMP
  instrumentation is currently broken for PrgEnv-cray.
- Compilation units getting processed by OPARI2 are now being
  preprocessed by the C/C++ preprocessor. This way it is possible to
  instrument OpenMP directives in header files. It also solves
  instrumentation problems cause by OpenMP pragmas within preprocessor
  defines. Preprocessing is the default but can be deactivated using
  --nopreprocess. When using PDT instrumentation, preprocessing is
  deactivated.
- To reduce the memory demands of dynamic regions in profiling mode,
  this version provides a lossy compression mechanism called
  'clustering'; similar subtrees of a dynamic region are clustered
  into one. This feature is enabled by default. There are three new
  environment variables for customization, please see the documentation
  for details.
- The new keyword 'MANGLED' was added to the filter file format to
  deal with cases where the displayed name and mangled name are
  different. The keyword 'FORTRAN' was removed.
- External metric sources can be utilized via a a plug-in mechanism.
  This feature is controlled via the SCOREP_METRIC_PLUGIN environment
  variable. Please see the documentation for details and an example.
- The CUDA adapter got refactored and extended to provide much more
  useful metrics. There are several new values to the environment
  variable SCOREP_CUDA_ENABLE. Please see the documentation for
  details.
- The machine name used in the profile and trace output is now
  configurable at built-time with the --with-machine-name flag or at
  run-time with the SCOREP_MACHINE_NAME measurement configuration
  variable.
- Full support to track the incurred OpenMP thread teams and utilizing
  the new generic threading records of OTF2.
- The Score-P internals were significantly refactored in order to
  increase flexibility to adapt to new programming paradigms and event
  sources.
- Please note that the feature 'selective tracing' was renamed to
  'selective recording' as it also applies to profiling.
- Please note that CUBE is a hard requirement when build Score-P from
  a tarball. This is due to the fact that we want to provide the user
  with 'scorep-score', that can't be build without the CUBE reader
  library available.

------------------- Released version 1.1 -----------------------------

- Rewind, a new event-trace recording mode for long-running
  experiments, triggered by user-instrumentation macros. Writes
  semantics information in OTF2 anchor file as rewind might affect
  analysis.
- ARM support (detection + compiler adapter).
- Metric service improvements. Support for per-process metrics and
  per-system-tree-class metrics.
- Support for OpenMP-task profiling and tracing alongside with
  improvements of the POMP adapter.
- Component separation: Score-P can now use pre-installed OTF2,
  OPARI2, and CUBE packages instead of the internal ones.
  - Removed dependency to external repository that was used by
    Score-P, OTF2, and OPARI2 in order to prevent version conflicts.
- Support for CUDA profiling and tracing.
- Easier experiment configuration via scorep-info which provides a
  list of all measurement configuration variables.
- scorep-info also provides the improved configure-summary of the
  installation.
- Scoring of profile experiments via scorep-score (if configured with
  external CUBE) to prepare a filter for subsequent trace experiment.
- Documentation improvements.
- Numerous configure improvements. Let external libraries use
  generic configure options (tbc). Fixed portability issues.
- Numerous instrumenter improvements. All possible combinations of
  options supported.
- MPI profiling improvements.
- OpenMP nesting supported although little tested.
- Several compiler-dependent OpenMP-related bugfixes.

------------------- Released version 1.0.2 ---------------------------

- Several instrumentation fixes:
  - Improvements for PDT Fortran instrumentation.
  - Improvements for C++ user instrumentation.
  - Return real failure if instrumentation is erroneous. Failures may
    went undetected previously.
  - Allow for out-of-place builds.
  - Provide correct parameter to SCOREP_USER_REGION_ENTER macro.

- Provide correct timestamp to OmpTaskCreate events.

- Fix invalid order of arguments provided to MpiCollectiveEnd events.

- Fix bug in parameter profiling.

- Enable SIONlib support, currently just for MPI applications.

- Various fixes for the generated OpenMP region names:
  - Inner and outer blocks got different names.
  - Regions with the ordered clause got a special name.
  - All region names got it '@file:lno' appended, to make them
    distinguishable.

------------------- Released version 1.0.1 ---------------------------

- Renaming of the configure related variable LD_FLAGS_FOR_BUILD to
  LDFLAGS_FOR_BUILD for consistency.

- Renaming of installed tool and options for consistency, i.e.
  changing underscores to dashes. Also, the --(no)openmp_support
  option changed to --(no)openmp.

- Improved linking on AIX systems.

- Robustness improvements when instrumenting with PDT.

- On x86 platforms, be more cautious using the tsc counter. If
  /proc/cpuinfo reports constant_tsc but not nonstop_tsc, then it is
  likely that the counter is unreliable.

- Improved configure summary.

- configure will not fail if -q or --silent is passed.

------------------- Released version 1.0 -----------------------------
