Skip to content
Snippets Groups Projects
Commit ed9f21bc authored by Sergey Yakubov's avatar Sergey Yakubov
Browse files

Merge pull request #184 in ASAPO/asapo from docs to develop

* commit 'cc97c70f':
  fix cmake
  freeze 21.12.0
parents 20fd84a8 cc97c70f
No related branches found
No related tags found
No related merge requests found
Showing
with 1307 additions and 66 deletions
## 21.12.0 (in progress)
## 21.12.0
FEATURES
* Consumer API: Get last within consumer group returns message only once
* Producer API: An option to write raw data to core filesystem directly
* Consumer/Producer API - packages for Debian 11.1
* Consumer/Producer API - packages for Debian 11.1, wheel for Python 3.9
* Consumer/Producer API - dropped Python 2 support for wheels and packages for new Debian/CentOS versions
INTERNAL
* Improved logging - tags for beamline, beamtime, ...
* Updated orchestration tools to latest version
## 21.09.0
FEATURES
......
function(cleanup varname)
string (REPLACE "-" "_" out ${${varname}})
SET( ${varname} ${out} PARENT_SCOPE)
string(REPLACE "-" "_" out ${${varname}})
SET(${varname} ${out} PARENT_SCOPE)
endfunction()
execute_process(COMMAND git describe --tags --abbrev=0
OUTPUT_VARIABLE ASAPO_TAG
WORKING_DIRECTORY ..)
execute_process(COMMAND git describe --tags --abbrev=0
OUTPUT_VARIABLE ASAPO_TAG
WORKING_DIRECTORY ..)
string(STRIP ${ASAPO_TAG} ASAPO_TAG)
execute_process(COMMAND git rev-parse --abbrev-ref HEAD
OUTPUT_VARIABLE BRANCH
WORKING_DIRECTORY ..)
OUTPUT_VARIABLE BRANCH
WORKING_DIRECTORY ..)
string(STRIP ${BRANCH} BRANCH)
cleanup(BRANCH)
......@@ -20,32 +20,37 @@ execute_process(COMMAND git rev-parse --short=10 HEAD
string(STRIP ${ASAPO_VERSION_COMMIT} ASAPO_VERSION_COMMIT)
if (${BRANCH} STREQUAL "master")
SET (ASAPO_VERSION ${ASAPO_TAG})
SET (ASAPO_VERSION_COMMIT "")
SET (ASAPO_VERSION_DOCKER_SUFFIX "")
SET (PYTHON_ASAPO_VERSION ${ASAPO_VERSION})
SET(ASAPO_VERSION ${ASAPO_TAG})
SET(ASAPO_VERSION_COMMIT "")
SET(ASAPO_VERSION_DOCKER_SUFFIX "")
SET(PYTHON_ASAPO_VERSION ${ASAPO_VERSION})
string(REGEX REPLACE "\\.0([0-9]+)\\."
".\\1." ASAPO_WHEEL_VERSION
${ASAPO_VERSION})
else()
SET (ASAPO_VERSION ${BRANCH})
SET (ASAPO_VERSION_COMMIT ", build ${ASAPO_VERSION_COMMIT}")
SET (ASAPO_VERSION_DOCKER_SUFFIX "-dev")
else ()
SET(ASAPO_VERSION ${BRANCH})
SET(ASAPO_VERSION_COMMIT ", build ${ASAPO_VERSION_COMMIT}")
SET(ASAPO_VERSION_DOCKER_SUFFIX "-dev")
string(REPLACE "_" "-" ASAPO_VERSION ${ASAPO_VERSION})
SET (ASAPO_VERSION 100.0.${ASAPO_VERSION})
SET(ASAPO_VERSION 100.0.${ASAPO_VERSION})
if (${BRANCH} STREQUAL "develop")
SET (PYTHON_ASAPO_VERSION 100.0.dev0)
else()
string(SUBSTRING ${ASAPO_VERSION} 20 -1 TMP)
string(REGEX MATCH "^([0-9]+)|.+$" ISSUE_NUM "${TMP}")
if (ISSUE_NUM STREQUAL "")
SET (PYTHON_ASAPO_VERSION 100.0.dev1)
else()
SET (PYTHON_ASAPO_VERSION 100.0.dev${ISSUE_NUM})
endif()
endif()
SET (ASAPO_WHEEL_VERSION ${ASAPO_VERSION})
endif()
SET(PYTHON_ASAPO_VERSION 100.0.dev0)
else ()
string(FIND ${BRANCH} feature_ASAPO pos)
if( ${pos} EQUAL 0)
string(SUBSTRING ${ASAPO_VERSION} 20 -1 TMP)
string(REGEX MATCH "^([0-9]+)|.+$" ISSUE_NUM "${TMP}")
if (ISSUE_NUM STREQUAL "")
SET(PYTHON_ASAPO_VERSION 100.0.dev1)
else ()
SET(PYTHON_ASAPO_VERSION 100.0.dev${ISSUE_NUM})
endif ()
else ()
SET(PYTHON_ASAPO_VERSION 100.0.dev1)
endif ()
endif ()
SET(ASAPO_WHEEL_VERSION ${ASAPO_VERSION})
endif ()
message("Asapo Version: " ${ASAPO_VERSION})
message("Python Asapo Version: " ${PYTHON_ASAPO_VERSION})
......
### Producer Protocol
| Release | used by client | Supported by server | Status |
| ------------ | ------------------- | -------------------- | ---------------- |
| v0.5 | | | In development |
| v0.4 | 21.09.0 - 21.09.0 | 21.09.0 - 21.09.0 | Current version |
| v0.3 | 21.06.0 - 21.06.0 | 21.06.0 - 21.09.0 | Deprecates from 01.09.2022 |
| v0.2 | 21.03.2 - 21.03.2 | 21.03.2 - 21.09.0 | Deprecates from 01.07.2022 |
| v0.1 | 21.03.0 - 21.03.1 | 21.03.0 - 21.09.0 | Deprecates from 01.06.2022 |
| v0.5 | 21.12.0 - 21.12.0 | 21.12.0 - 21.12.0 | Current version |
| v0.4 | 21.09.0 - 21.09.0 | 21.09.0 - 21.12.0 | Deprecates from 01.12.2022 |
| v0.3 | 21.06.0 - 21.06.0 | 21.06.0 - 21.12.0 | Deprecates from 01.09.2022 |
| v0.2 | 21.03.2 - 21.03.2 | 21.03.2 - 21.12.0 | Deprecates from 01.07.2022 |
| v0.1 | 21.03.0 - 21.03.1 | 21.03.0 - 21.12.0 | Deprecates from 01.06.2022 |
### Consumer Protocol
| Release | used by client | Supported by server | Status |
| ------------ | ------------------- | -------------------- | ---------------- |
| v0.5 | | | In development |
| v0.4 | 21.06.0 - 21.09.0 | 21.06.0 - 21.09.0 | Current version |
| v0.3 | 21.03.3 - 21.03.3 | 21.03.3 - 21.09.0 | Deprecates from 01.07.2022 |
| v0.2 | 21.03.2 - 21.03.2 | 21.03.2 - 21.09.0 | Deprecates from 01.06.2022 |
| v0.1 | 21.03.0 - 21.03.1 | 21.03.0 - 21.09.0 | Deprecates from 01.06.2022 |
| v0.5 | 21.12.0 - 21.12.0 | 21.12.0 - 21.12.0 | Current version |
| v0.4 | 21.06.0 - 21.09.0 | 21.06.0 - 21.12.0 | Deprecates from 01.12.2022 |
| v0.3 | 21.03.3 - 21.03.3 | 21.03.3 - 21.12.0 | Deprecates from 01.07.2022 |
| v0.2 | 21.03.2 - 21.03.2 | 21.03.2 - 21.12.0 | Deprecates from 01.06.2022 |
| v0.1 | 21.03.0 - 21.03.1 | 21.03.0 - 21.12.0 | Deprecates from 01.06.2022 |
......@@ -2,25 +2,25 @@
| Release | API changed\*\* | Protocol | Supported by server from/to | Status |Comment|
| ------------ | ----------- | -------- | ------------------------- | --------------------- | ------- |
| 21.12.0 | No | v0.5 | 21.12.0/21.12.0 | in development | |
| 21.09.0 | No | v0.4 | 21.09.0/21.09.0 | current version |beamline token for raw |
| 21.06.0 | Yes | v0.3 | 21.06.0/21.09.0 | deprecates 01.09.2022 |arbitrary characters|
| 21.03.3 | No | v0.2 | 21.03.2/21.09.0 | deprecates 01.07.2022 |bugfix in server|
| 21.03.2 | Yes | v0.2 | 21.03.2/21.09.0 | deprecates 01.07.2022 |bugfixes, add delete_stream|
| 21.03.1 | No | v0.1 | 21.03.0/21.09.0 | deprecates 01.06.2022 |bugfix in server|
| 21.03.0 | Yes | v0.1 | 21.03.0/21.09.0 | | |
| 21.12.0 | Yes | v0.5 | 21.12.0/21.12.0 | current version | |
| 21.09.0 | Yes | v0.4 | 21.09.0/21.12.0 | deprecates 01.12.2022 |beamline token for raw |
| 21.06.0 | Yes | v0.3 | 21.06.0/21.12.0 | deprecates 01.09.2022 |arbitrary characters|
| 21.03.3 | No | v0.2 | 21.03.2/21.12.0 | deprecates 01.07.2022 |bugfix in server|
| 21.03.2 | Yes | v0.2 | 21.03.2/21.12.0 | deprecates 01.07.2022 |bugfixes, add delete_stream|
| 21.03.1 | No | v0.1 | 21.03.0/21.12.0 | deprecates 01.06.2022 |bugfix in server|
| 21.03.0 | Yes | v0.1 | 21.03.0/21.12.0 | deprecates 01.06.2022 | |
### Consumer API
| Release | API changed\*\* | Protocol | Supported by server from/to | Status |Comment|
| ------------ | ----------- | --------- | ------------------------- | ---------------- | ------- |
| 21.12.0 | Yes | v0.5 | 21.12.0/21.12.0 | in development | |
| 21.09.0 | No | v0.4 | 21.06.0/21.09.0 | current version | |
| 21.06.0 | Yes | v0.4 | 21.06.0/21.09.0 | |arbitrary characters, bugfixes |
| 21.03.3 | Yes | v0.3 | 21.03.3/21.09.0 | deprecates 01.06.2022 |bugfix in server, error type for dublicated ack|
| 21.03.2 | Yes | v0.2 | 21.03.2/21.09.0 | deprecates 01.06.2022 |bugfixes, add delete_stream|
| 21.03.1 | No | v0.1 | 21.03.0/21.09.0 | deprecates 01.06.2022 |bugfix in server|
| 21.03.0 | Yes | v0.1 | 21.03.0/21.09.0 | | |
| 21.12.0 | Yes | v0.5 | 21.12.0/21.12.0 | current version | |
| 21.09.0 | No | v0.4 | 21.06.0/21.12.0 | deprecates 01.12.2022 | |
| 21.06.0 | Yes | v0.4 | 21.06.0/21.12.0 | deprecates 01.09.2022 |arbitrary characters, bugfixes |
| 21.03.3 | Yes | v0.3 | 21.03.3/21.12.0 | deprecates 01.06.2022 |bugfix in server, error type for dublicated ack|
| 21.03.2 | Yes | v0.2 | 21.03.2/21.12.0 | deprecates 01.06.2022 |bugfixes, add delete_stream|
| 21.03.1 | No | v0.1 | 21.03.0/21.12.0 | deprecates 01.06.2022 |bugfix in server|
| 21.03.0 | Yes | v0.1 | 21.03.0/21.12.0 | deprecates 01.06.2022 | |
\* insignificant changes/bugfixes (e.g. in return type, etc), normally do not require client code changes, but formally might break the client
......
---
title: Version 21.12.0
author: Sergey Yakubov
author_title: DESY IT
tags: [release]
---
#Changelog for version 21.12.0
FEATURES
* Consumer API: Get last within consumer group returns message only once
* Producer API: An option to write raw data to core filesystem directly
* Consumer/Producer API - packages for Debian 11.1, wheel for Python 3.9
* Consumer/Producer API - dropped Python 2 support for wheels and packages for new Debian/CentOS versions
INTERNAL
* Improved logging - tags for beamline, beamtime, ...
* Updated orchestration tools to latest version
......@@ -2,7 +2,6 @@
if [[ -z "${DOCS_VERSION}" ]]; then
echo No version specified
exit 1
fi
......@@ -27,22 +26,28 @@ CONTENT='content=\"\.\/'
#replace the links to the code examples to the frozen copies
for file in $(find ./versioned_docs/version-$DOCS_VERSION -type f)
do
ed -s $file <<ED_COMMANDS > /dev/null 2>&1
,s/content=\"\?\.\/examples/content=\".\/${VERSIONED_EXAMPLES_ESCAPED}/g
w
ED_COMMANDS
if [[ `uname -s` == "Darwin" ]]; then
sed -i '' -e "s/content=\"\.\/examples/content=\".\/${VERSIONED_EXAMPLES_ESCAPED}/g" $file
else
sed -i -e "s/content=\"\.\/examples/content=\".\/${VERSIONED_EXAMPLES_ESCAPED}/g" $file
fi
done
#replace the links to the dev-packages to the versioned ones
read -r -d '' template << EOM
-e s/asapo-cluster-dev:100\.0\.develop/asapo-cluster:${DOCS_VERSION}/g
-e s/==100\.0\.dev0/==${VERSION_FOR_PIP}/g
-e s/100\.0[~.]develop/${DOCS_VERSION}/g
-e s/100\.0[~.]dev0/${DOCS_VERSION}/g
EOM
for file in $(find ./${VERSIONED_EXAMPLES} -type f)
do
ed -s $file <<ED_COMMANDS > /dev/null 2>&1
,s/asapo-cluster-dev:100\.0\.develop/asapo-cluster:${DOCS_VERSION}/g
,s/==100\.0\.dev0/==${VERSION_FOR_PIP}/g
,s/100\.0[~.]develop/${DOCS_VERSION}/g
,s/100\.0[~.]dev0/${DOCS_VERSION}/g
w
ED_COMMANDS
if [[ `uname -s` == "Darwin" ]]; then
sed -i '' $template $file
else
sed -i $template $file
fi
done
exit 0
---
title: Comparison to Other Solutions
---
Here we consider how ASAPO is different from other workflows practiced at DESY. The possible candidates are:
### Filesystem
Probably the most often used approach for now. Files are written to the beamline filesystem directly via NFS/SMB mount or by HiDRA and copied to the core filesystem by a copy daemon. A user (software) then reads the files from the filesystem.
### Filesystem + Kafka
Previous workflow + there is a Kafka instance that produces messages when a file appears in the core filesystem. These messages can then be consumed by user software.
### HiDRA
HiDRA can work in two modes - wether data is transferred via it or data is written over NFS/SMB mounts and HiDRA monitors a folder in a beamline filesystem. In both case one can subscribe to HiDRA's data queue to be informed about a new file.
### ASAPO
ASAPO does not work with files, rather with data streams. Well, behind the scene it does use files, but in principle what a user see is a stream of messages, where a message typically consists of metadata and data blobs. Data blob can be a single image, a file with arbitrary content or whatever else, even null. Important is - what goes to one end, appears on the other. And that each message must have a consecutive index. These messages must be ingested to an ASAPO data stream in some way (e.g. using ASAPO Producer API or HiDRA) and data, if not yet, will be transferred and stored in the data center. A user can then read the messages from the stream and process it in a way he likes.
### Compare by categories
In the table below we compare the approaches from different points of view and in the [next table](#compare-by-features) we compare the approaches by available features.
| Category | Filesystem | Filesystem+Kafka | HiDRA | ASAPO |
|----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Data ingest | traditional way - write to disk | same as Filesystem, additionally a message is send to a Kafka topic with basic file metadata (name, size, creation date, ...) | outperforms data ingest via NFS/SMB, uses ZeroMQ(TCP), saturates 10GE bandwidth. To be tested with 100GE. Same metadata as with Kafka | uses parallel TCP connections & in the future RDMA. Can add arbitrary metadata (JSON format). Can send data to various streams to create arbitrary data hierarchy (e.g. stream per scan). Saves data to a memory buffer and on disk. |
| Offline analysis | traditional way - read from disk. Need to know the filename to read, usually not a problem. | same as Filesystem, there is no benefits reading Kafka queue after all data is arrived | not possible, need to fallback to Filesystem | same as online analysis (see below) |
| Online analysis | no efficient way to recognise that new data is available - periodically read folder content and compare with previous state?, periodically check file appeared? | using a subscription to a "new files" topic, a user software can be made aware of new data quite soon and react correspondingly. | one can subscribe to a HiDRA's ZeroMQ stream and consume data. If data arrives faster than it can be processed or e.g. user software crashes - data might be skipped. | one can get data from various streams in different ways - get next unprocessed message ordered by index, get last, get by id. Since everything is stored in persistent storage, processing is possible with arbitrary slow (but also fast) consumers. Resilient to connections loss or consumer crashes. |
| Performance | as good as read/write to disk. Can be an issue, especially with new detectors 100GE+ networks. | as good as read/write to disk + some latency for the file to be written to the beamline filesystem, copied to the core filesystem and a message to go through Kafka. | data is available as soon as it is received by HiDRA. If multiple consumer groups want to read same data (one consumer is known - the process that writes the file), data will be transferred multiple times, which influences the network performance. | data is available to be consumed as soon as it is arrived and saved to beamline filesystem (later can be optimised by using persistent memory instead). No need to read from disk since it also remains in memory buffer. |
| Parallelisation | Parallelisation is easily possible e.g. with an MPI library. | Parallelisation is possible if Kafka's topics are partitioned (which is not the case in the moment) | Not out of the box, possible with some changes from user's side | Parallelisation is easily possible, one can consume data concurrently with different consumers from the same stream. Normally, synchronisation between consumers is not needed, but this might depend on a use case. When configured, data can be resent if not acknowledged during a specified time period. |
| Search/filter | hardly possible, manually parsing some metadata file, using POSIX commands? | same as Filesystem. There is Kafka SQL query language which could be used if there would be metadata in messages, which is not the case (yet?). | not possible | one can use a set of SQL queries to ingested metadata. |
| General comments | Might be ok for slow detectors or/and a use case without online analysis requirements. Might be the only way to work with legacy applications | Fits well for the cases where a software just need a trigger that some new data has arrived, when processing order, extra metadata, parallelisation is not that important or implemented by other means. Some delay between an image is generated and the event is emitted by Kafka is there, but probably not that significant (maximum a couple of seconds). Might be not that appropriate for very fast detectors since still using filesystem to write/read data. | Works quite well to transfer files from detector to the data center. Also a good candidate for live viewers, where the last available "image" should be displayed. Does not work for offline analysis or for near real-time analysis where image processing can take longer than image taking. | Tries to be a general solution which improves in areas where other approaches not suffice: single code for offline/near real-time/online analysis, parallelisation, extended metadata, efficient memory/storage management, getting data without access to filesystem (e.g. from detector pc without core filesystem mounted), computational pipelines, ... Everything has its own price: user software must be modified to use ASAPO, a wrapper might be needed for legacy software that cannot be modified, user/beamtime scientist should better structure the data - e.g. consecutive indexes must be available for each image, one has to define to which stream write/read data, what is the format of the data, ... |
### Compare by features
| Feature | Filesystem | Filesystem+Kafka | HiDRA | ASAPO |
|--------------------------------------------------------------------------------------------------------------------|------------------------------------|------------------------------------|------------------------------------|------------------------------------|
| send metadata with image | No | No | No | Yes |
| get last image | No | No | Yes | Yes |
| get image by id | No | No | No | Yes |
| get image in order | No | No | No | Yes |
| Immediately get informed that a new image is arrived | No | Yes | Yes | Yes |
| access image remotely, without reading filesystem | No | No | Yes, if it is still in buffer | Yes |
| access past images | Yes | Yes | No | Yes |
| need to change user code | No | Yes | Yes | Yes |
| parallelisation | Yes (if user software allows that) | Not out of the box | Not out of the box | Yes |
| legacy applications | Yes | No (wrapper could be a workaround) | No (wrapper could be a workaround) | No (wrapper could be a workaround) |
| transparent restart/continuation of simulations in case e.g. worker process crashes, also for parallel simulations | Not out of the box | Yes | No | Yes |
---
title: Consumer Clients
---
Consumer client (or consumer) is a part of a distributed streaming system that is responsible for processing streams of data that were created by producer. It is usually a user (beamline scientist, detector developer, physicist, ... ) responsibility to develop a client for specific beamline, detector or experiment using ASAPO Consumer API and ASAPO responsibility to make sure data is delivered to consumers in an efficient and reliable way.
![Docusaurus](/img/consumer-clients.png)
Consumer API is available for C++ and Python and has the following main functionality:
- Create a consumer instance and bind it to a specific beamtime and data source
- multiple instances can be created (also within a single application) to receive data from different sources
- a beamtime token is used for access control
- If needed (mainly for get_next_XX operations), create a consumer group that allows to process messages independently from other groups
- Receive messages from a specific stream (you can read more [here](data-in-asapo) about data in ASAPO)
- GetNext to receive process messages one after another without need to know message indexes
- Consumer API returns a message with index 1, then 2, ... as they were set by producer.
- This also works in parallel so that payload is distributed within multiple consumers within same consumer group or between threads of a single consumer instance. In parallel case order of indexes of the messages is not determined.
- GetLast to receive last available message - for e.g. live visualisation
- GetById - get message by index - provides random access
- Make queries based on metadata contained in a message - returns all messages in a stream with specific metadata. A subset of SQL language is used
All of the above functions can return only metadata part of the message, so that an application can e.g. extract the filename and pass it to a 3rd party tool for processing. Alternative, a function may return the complete message with metadata and data so that consumer can directly process it. An access to the filesystem where data is actually stored is not required in this case.
:::note
In case of dataset family of functions, only list of dataset messages is returned, the data can be retrieved in a separate call.
:::
Please refer to [C++](http://asapo.desy.de/cpp/) and [Python](http://asapo.desy.de/python/) documentation for specific details (available from DESY intranet only).
---
title: Acknowledgements
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
While consuming the messages we could issue acknowledgements, to denote that the messages were (or were not) processed successfully.
Here is the snippet that expects 10 sample messages in the default stream. When consuming the messages, the message #3 receives a negative acknowledgement, which puts is back in the stream for the repeated processing, and the messages 5 and 7 remain unacknowledged. On the second attempt the message #3 gets acknowledged.
You can found the full example in git repository.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/acknowledgements.py" snippetTag="consume"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/acknowledgements.cpp" snippetTag="consume"
```
</TabItem>
</Tabs>
The list of unacknowledged messages can be accessed at any time. This snippet prints the list of unacknowledged messages.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/acknowledgements.py" snippetTag="print"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/acknowledgements.cpp" snippetTag="print"
```
</TabItem>
</Tabs>
The output will show the order in which the messages receive their acknowledgements. You may notice that the second acknowledgement of the message #3 happens with a delay, which was deliberatly chosen. The unacknowledged messages are retrieved separately at the end, after the consumer timeout.
---
title: Datasets
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
The messages in the stream can be multi-parted. If you have several producers (e.g. sub-detectors) that produces several parts of the single message, you can use datasets to assemble a single message from several parts.
## Dataset Producer
Here is the code snippet that can be used to produce a three-parted dataset. The full usable example can be found in git repository.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/produce_dataset.py" snippetTag="dataset"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/produce_dataset.cpp" snippetTag="dataset"
```
</TabItem>
</Tabs>
You should see the "successfuly sent" message in the logs, and the file should appear in the corresponding folder (by default in ```/var/tmp/asapo/global_shared/data/test_facility/gpfs/test/2019/data/asapo_test```)
## Dataset Consumer
Here is the snippet that can be used to consume a dataset. The full example is also in git.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/consume_dataset.py" snippetTag="dataset"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/consume_dataset.cpp" snippetTag="dataset"
```
</TabItem>
</Tabs>
The details about the received dataset should appear in the logs, together with the message "stream finished" (if the "finished" flag was sent for the stream). The "stream ended" message will appear for non-finished streams, but may also mean that the stream does not exist (or was deleted).
---
title: Metadata
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
You can also store any custom metadata with your beamtime, stream, and each message. This tutorial shows you how you can store, update and access this metadata. The metadata is stored in JSON, and any JSON structure is supported.
:::info
Since C++ doesn't have a built-in JSON support, you'd have to use 3rd party libs if you want JSON parsing. In this tutorial we won't use any JSON parsing for C++, and will treat JSONs as regular strings. Please note, that ASAP::O only supports valid JSONs, and providing invalid input will result in error.
:::
## Send Metadata
The following snippet shows how to send the beamtime metadata.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/metadata.py" snippetTag="beamtime_set"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/metadata.cpp" snippetTag="beamtime_set"
```
</TabItem>
</Tabs>
Each metadata can be updated at any moment. Here is the example on how to do it with beamtime metadata.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/metadata.py" snippetTag="beamtime_update"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/metadata.cpp" snippetTag="beamtime_update"
```
</TabItem>
</Tabs>
The same way the metadata can be set for each stream.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/metadata.py" snippetTag="stream_set"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/metadata.cpp" snippetTag="stream_set"
```
</TabItem>
</Tabs>
And for each message
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/metadata.py" snippetTag="message_set"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/metadata.cpp" snippetTag="message_set"
```
</TabItem>
</Tabs>
## Read Metadata
Here we will read the beamtime metadata. In this example it will already incorporate the changes we did during the update
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/metadata.py" snippetTag="beamtime_get"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/metadata.cpp" snippetTag="beamtime_get"
```
</TabItem>
</Tabs>
Same for the stream.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/metadata.py" snippetTag="stream_get"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/metadata.cpp" snippetTag="stream_get"
```
</TabItem>
</Tabs>
And for the message.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/metadata.py" snippetTag="message_get"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/metadata.cpp" snippetTag="message_get"
```
</TabItem>
</Tabs>
The output will show the metadata retrieved from the beamtime, stream and message.
---
title: Stream Finishing
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
When all the data in the stream is sent, the stream can be finished, and it is posiible to set the "next stream" to follow up the first. In this tutorial it'll be shown how several streams can be chained together in single consumer by using the stream finishing.
The setting of the next stream is done by providing an additional parameter while finishing the stream
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/next_stream.py" snippetTag="next_stream_set"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/next_stream.cpp" snippetTag="next_stream_set"
```
</TabItem>
</Tabs>
The reading of the streams can be then chained together. When one stream finishes, and the next stream is provided, the reading of the next stream can immediately start. This example will read the whole chain of streams, until it encounters the non-finished stream, or the stream that was finished without the ```next```.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/next_stream.py" snippetTag="read_stream"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/next_stream.cpp" snippetTag="read_stream"
```
</TabItem>
</Tabs>
The output will show the messages being consumed from the streams in order. For this example (full file can be found in git repository) it'll be first the ```default``` stream, then the ```next```.
---
title: Code Examples Overview
---
Here you can find the code examples for various common asapo usecases. Make sure that the ASAP::O instance and client libraries are properly installed, see [Getting Started page](../) for details.
For the most basic usecase, see the [Simple Producer](simple-producer) and [Simple Consumer](simple-consumer). There are also the basic examples of CMake and makefile configurations for client compilation.
The API documentation can be found [here](http://asapo.desy.de/cpp) (for C++) or [here](http://asapo.desy.de/python) (for python).
:::tip
You can see more examples in ASAPO [source code](https://stash.desy.de/projects/ASAPO/repos/asapo/browse/examples)
:::
---
title: Message query
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
Messages in streams can be retrieved based on their metadata. Both the technical information (e.g. ID or timestamp) and the user metadata (see [this tutorial](metadata) for details) can be used to make a query. In this tutorial several examples of the queries are shown. The standard SQL sysntaxis is used.
For this example we expect several messages in the default stream with the metadata consisting of two fields: a string named ```condition``` and an integer named ```somevalue```. Go to the git repository for the full example.
:::info
Keep in mind, that the query requests return only the list of metadatas for the found messages, not the messages itself. You need to explicitly retrieve the actual data for each message.
:::
Here we can pick a message with the specific ID.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/query.py" snippetTag="by_id"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/query.cpp" snippetTag="by_id"
```
</TabItem>
</Tabs>
We can also use the simple rule for picking a range of IDs
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/query.py" snippetTag="by_ids"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/query.cpp" snippetTag="by_ids"
```
</TabItem>
</Tabs>
We can query the messages based on their metadata, for example request a specific value of the string field.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/query.py" snippetTag="string_equal"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/query.cpp" snippetTag="string_equal"
```
</TabItem>
</Tabs>
We can also require some more complex constraints on the metadata, e.g. a range for an integer field
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/query.py" snippetTag="int_compare"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/query.cpp" snippetTag="int_compare"
```
</TabItem>
</Tabs>
Since every message comes with a timestamp, we can make constraints on it as well. For example, request all the messages from the last 15 minutes.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/query.py" snippetTag="timestamp"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/query.cpp" snippetTag="timestamp"
```
</TabItem>
</Tabs>
The output of the full example will show the message selection together with the conditions used for selection.
---
title: Simple Consumer
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
This example shows how to consume a message. This page provides snippets for simple consumer. You can go to BitBucket to see the whole example at once. The files there is a working example ready for launch.
A special access token is needed to create a consumer. For the purpose of this tutorial a special "test" token is used. It will only work for the beamtime called "asapo_test".
First step is to create an instance of the consumer.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
{ label: 'C', value: 'c', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/consume.py" snippetTag="create"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/consume.cpp" snippetTag="create"
```
</TabItem>
<TabItem value="c">
```c content="./versioned_examples/version-21.12.0/c/consume.c" snippetTag="create"
```
</TabItem>
</Tabs>
You can list all the streams within the beamtime.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
{ label: 'C', value: 'c', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/consume.py" snippetTag="list"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/consume.cpp" snippetTag="list"
```
</TabItem>
</Tabs>
The actual consuming of the message will probably be done in a loop. Here is an example how such a loop could be organized. It will run until the stream is finished, or no new messages are received within the timeout.
You need to use the group ID that can be used by several consumer in parallel. You can either generate one or use a random string.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
{ label: 'C', value: 'c', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/consume.py" snippetTag="consume"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/consume.cpp" snippetTag="consume"
```
</TabItem>
<TabItem value="c">
```c content="./versioned_examples/version-21.12.0/c/consume.c" snippetTag="consume"
```
</TabItem>
</Tabs>
After consuming the stream you can delete it.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
{ label: 'C', value: 'c', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/consume.py" snippetTag="delete"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/consume.cpp" snippetTag="delete"
```
</TabItem>
<TabItem value="c">
```c content="./versioned_examples/version-21.12.0/c/consume.c" snippetTag="delete"
```
</TabItem>
</Tabs>
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
{ label: 'C', value: 'c', },
]
}>
<TabItem value="python">
For Python example just launch it with python interpreter (be sure that the ASAP::O client python modules are installed)
```
$ python3 consumer.py
```
</TabItem>
<TabItem value="cpp">
For C++ example you need to compiled it first. The easiest way to do it is by installing ASAP::O client dev packages and using the CMake find_package function. CMake will generate the makefile that you can then use to compile the example.
The example CMake file can look like this
```cmake content="./versioned_examples/version-21.12.0/cpp/CMakeLists.txt" snippetTag="#consumer"
```
You can use it like this
```bash
$ cmake . && make
$ ./asapo-consume
```
</TabItem>
<TabItem value="c">
Compile e.g. using Makefile and pkg-config (although we recommend CMake - see C++ section) and execute. This example assumes asapo is installed to /opt/asapo. Adjust correspondingly.
```makefile content="./versioned_examples/version-21.12.0/c/Makefile" snippetTag="#consumer"
```
```
$ make
$ ./asapo-consume
```
</TabItem>
</Tabs>
The details about the received message should appear in the logs, together with the message "stream finished" (if the "finished" flag was sent for the stream). The "stream ended" message will appear for non-finished streams, but may also mean that the stream does not exist (or was deleted).
---
title: Simple Pipeline
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
The consumer and a producer could be combined together in order to create pipelines. Look at the corresponding examples to learn about producers and consumers in detailes.
Here is the snippet that shows how to organize a pipelined loop. The full runnable example can be found in git repository.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/pipeline.py" snippetTag="pipeline"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/pipeline.cpp" snippetTag="pipeline"
```
</TabItem>
</Tabs>
Just like with any produced stream, the pipelined stream can be marked as "finished". Here's the snippet that shows how to access the last message id in the stream.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/pipeline.py" snippetTag="finish"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/pipeline.cpp" snippetTag="finish"
```
</TabItem>
</Tabs>
The details about the received message should appear in the logs, together with the message "stream finished" (if the "finished" flag was sent for the stream). The "stream ended" message will appear for non-finished streams, but may also mean that the stream does not exist (or was deleted). The processed file should appear in the corresponding folder (by default in ```/var/tmp/asapo/global_shared/data/test_facility/gpfs/test/2019/data/asapo_test```)
---
title: Simple Producer
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
This example produces a simple message. This page provides snippets for simple producer for both Python and C++. You can go to BitBucket to see the whole example at once. The files there is a working example ready for launch.
First step is to create an instance of the producer.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/produce.py" snippetTag="create"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/produce.cpp" snippetTag="create"
```
</TabItem>
</Tabs>
Then, we need to define a callback that would be used for sending. The callback is called when the message is actually sent, which may happen with a delay.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/produce.py" snippetTag="callback"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/produce.cpp" snippetTag="callback"
```
</TabItem>
</Tabs>
Next we schedule the actual sending. This function call does not perform the actual sending, only schedules it. The sending will happen in background, and when it is done the callbeack will be called (if provided).
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/produce.py" snippetTag="send"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/produce.cpp" snippetTag="send"
```
</TabItem>
</Tabs>
The sending of the messages will probably be done in a loop. After all the data is sent, some additional actions might be done. You may want to wait for all the background requests to be finished before doing something else or exiting the application.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
```python content="./versioned_examples/version-21.12.0/python/produce.py" snippetTag="finish"
```
</TabItem>
<TabItem value="cpp">
```cpp content="./versioned_examples/version-21.12.0/cpp/produce.cpp" snippetTag="finish"
```
</TabItem>
</Tabs>
You can get the full example from BitBucket and test it locally.
<Tabs
groupId="language"
defaultValue="python"
values={[
{ label: 'Python', value: 'python', },
{ label: 'C++', value: 'cpp', },
]
}>
<TabItem value="python">
For Python example just launch it with python interpreter (be sure that the ASAP::O client python modules are installed).
```bash
$ python3 produce.py
```
</TabItem>
<TabItem value="cpp">
For C++ example you need to compiled it first. The easiest way to do it is by installing ASAP::O client dev packages and using the CMake find_package function. CMake will generate the makefile that you can then use to compile the example.
The example CMake file can look like this.
```cmake content="./versioned_examples/version-21.12.0/cpp/CMakeLists.txt" snippetTag="#producer"
```
You can use it like this.
```bash
$ cmake . && make
$ ./asapo-produce
```
</TabItem>
</Tabs>
You should see the "successfuly sent" message in the logs, and the file should appear in the corresponding folder (by default in ```/var/tmp/asapo/global_shared/data/test_facility/gpfs/test/2019/data/asapo_test```).
---
title: Core Architecture
---
For those who are curious about ASAPO architecture, the diagram shows some details. Here arrows with numbers is an example of data workflow explained below.
![Docusaurus](/img/core-architecture.png)
## Data Workflow (example)
the workflow can be split into two more or less independent tasks - data ingestion and data retrieval
### Data ingestion (numbers with i on the diagram)
1i) As we [know](producer-clients.md), producer client is responsible for ingesting data in the system. Therefore the first step is to detect that the new message is available. This can be done using another tool developed at DESY named [HiDRA](https://confluence.desy.de/display/FSEC/HiDRA). This tool monitors the source of data (e.g. by monitoring a filesystem or using HTTP request or ZeroMQ streams, depending on detector type)
2i) HiDRA (or other user application) then uses ASAPO Producer API to send messages (M1 and M2 in our case) in parallel to ASAPO Receiver. TCP/IP or RDMA protocols are used to send data most efficiently. ASAPO Receiver receives data in a memory cache
3i) - 4i) ASAPO saves data to a filesystem and adds a metadata record to a database
5i) A feedback is send to the producer client with success or error message (in case of error, some of the step above may not happen)
### Data retrieval (numbers with r on the diagram)
[Consumer client](consumer-clients.md)) is usually a user application that retrieves data from the system to analyse/process it.
The first step to retrieve a message via Consumer API is to pass the request to the Data Broker (1r). The Data Broker retrieves the metadata information about the message from the database (2r) and returns it to the Consumer Client. The Consumer Client analyses the metadata information and decides how to get the data. It the data is still in the Receiver memory cache, the client requests data from there via a Data Server (which is a part of ASAPO Receiver). Otherwise, client gets the data from the filesystem - directly if the filesystem is accessible on the machine where the client is running or via File Transfer Service if not.
---
title: Data in ASAPO
---
All data that is produced, stored and consumed via ASAPO is structured on several levels.
#### Beamtime
This is the top level. Contains all data collected/produced during a single beamtime (Beamtime is the term used at DESY. Can also be Run, Experiment, Proposal, ...). Each beamtime has its own unique ID.
#### Data Source
During a beamtime, data can be produced by different sources. For example, a detector is a data source, if multiple detectors are used during an experiment, they can be different data sources or the same data source (more details below in datasets section). A user application that simulates or analyses data can also act as an ASAPO data source. Each data source has its own unique name within a beamtime.
#### Data Stream
Each data source can emit multiple data streams. Each stream has a unique within a specific data source name.
#### Message
Data streams consist of smaller entities - messages. The content of a message is quite flexible, to be able to cover a broad amount of use cases. Usually it is a metadata and some binary data (e.g. a detector image, or an hdf5 file with multiple images). At the moment ASAPO itself is agnostic to the data and sees it as a binary array. Later some specific cases might be handled as well (the most prominent use case - an hdf5 file with multiple images).
An important aspect is that each message within a data stream must be assigned a consecutive integer index. Therefore, a streams always contain messages with index = 1,2,3 ... . This is different to traditional messaging systems where messages have timestamps or arbitrary unique hash IDs. The reason is that with timestamps the order of messages saved in the system might differ from the order the were generated by the data source (e.g. detector). And keeping correct order is required in many cases. Second reason is that it makes a random access to a specific message quite straightforward.
#### Datasets/Dataset substreams
In some cases multiple detectors are using during an experiment. E.g. a 3D image is composed from multiple 2D images created by different detectors. In this case these 2D images can be composed to a dataset so that it a be processed later as a whole. One would then use a single data source (which would mean a set of detectors or "multi-detector" in this case), single data stream and, to compose a dataset, for each of it's components (each 2D image in our example) the corresponding detector would send a message with same id but to a different dataset substream.
So, for the case without datasets (single detector) the data hierarchy is Beamtime→Data Source → Data Stream → Message:
![Docusaurus](/img/data-in-asapo-workflow.png)
And with datasets (multi-detector) the data hierarchy is Beamtime→Data Source → Data Stream → Dataset→ Message in Dataset Substream:
![Docusaurus](/img/data-in-asapo-workflow2.png)
---
title: Getting Started
slug: /
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
## Start ASAPO services {#step-1}
If you already have running ASAPO services and know the endpoint, you don't need this, and can go to [Client Libraries](#step-2).
Otherwise, for testing purposes one can start ASAPO services in a standalone mode (this is not recommended for production deployment).
The easiest way is to use a Docker container.
So, make sure Docker is installed and you have necessary permissions to use it.
Please note that this will only work on a Linux machine. Also please note that ASAPO needs some ports to be available. You can check the list
[here](https://stash.desy.de/projects/ASAPO/repos/asapo/browse/deploy/asapo_services/scripts/asapo.auto.tfvars.in#37).
Now, depending on how your Docker daemon is configured (if it uses a
unix socket or a tcp port for communications)
you can use pick corresponding script below, adjust and execute it to start ASAPO services.
<Tabs
defaultValue="unix"
values={[
{ label: 'Docker with unix socket (default)', value: 'unix', },
{ label: 'Docker with tcp (used on FS machines)', value: 'tcp', },
]
}>
<TabItem value="unix">
```shell content=./examples/start_asapo_socket.sh"
```
</TabItem>
<TabItem value="tcp">
```shell content="./versioned_examples/version-21.12.0/start_asapo_tcp.sh"
```
</TabItem>
</Tabs>
at the end you should see
<p className="green-text"><strong>Apply complete! Resources: 19 added, 0 changed, 0 destroyed.</strong></p>
which means ASAPO services successfully started. Your ASAPO endpoint for API calls will be **localhost:8400**.
### Create data directories
Next, you need to create directories where ASAPO will store the data
(the structure matches the one used at DESY experiments).
Since we are going to use beamline `test` and beamtime `asapo_test` in following examples,
we must create two folders, one for the beamline filesystem and one for the core file system:
```shell
ASAPO_HOST_DIR=/var/tmp/asapo # the folder used in step 1
mkdir -p $ASAPO_HOST_DIR/global_shared/online_data/test/current/raw
mkdir -p $ASAPO_HOST_DIR/global_shared/data/test_facility/gpfs/test/2019/data/asapo_test
```
:::note ASAP::O in production mode
We have a running instance for processing data collected during experiments. Please get in touch with FS-SC group for more information.
:::
### Services shutdown
After you've done with your instance of ASAPO, you might want to gracefully shutdown the running services. If you don't do it, your machine will become bloated with the unused docker images.
```shell content="./versioned_examples/version-21.12.0/cleanup.sh"
```
<br/><br/>
## Install client libraries {#step-2}
Now you can install Python packages or C++ libraries for ASAPO Producer and Consumer API (you need to be in DESY intranet to access files).
<Tabs
defaultValue="python-pip"
values={[
{ label: 'Python - pip', value: 'python-pip', },
{ label: 'Python - packages', value: 'python-packages', },
{ label: 'C++ packages', value: 'cpp', },
]
}>
<TabItem value="python-pip">
```shell content="./versioned_examples/version-21.12.0/install_python_clients_pip.sh" snippetTag="#snippet1"
```
</TabItem>
<TabItem value="python-packages">
```shell content="./versioned_examples/version-21.12.0/install_python_clients_pkg.sh"
```
</TabItem>
<TabItem value="cpp">
```shell content="./versioned_examples/version-21.12.0/install_cpp_clients.sh"
```
</TabItem>
</Tabs>
## Code examples
Please refer to the [Code Examples](cookbook/overview) sections to see the code snippets for various usage scenarious.
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment