Infinite Programming Tips: August 2024

Thursday, August 29, 2024

Replace all text except email address in notepad++

Removing All Text Except Email Addresses in Notepad++

To remove all text except for email addresses in Notepad++, you can use the Find and Replace feature with a regular expression. Here’s how to do it:

1. Open Notepad++ and load your file containing the names and email addresses.

2. Open the Find Dialog:

__Press "Ctrl + H" to open the Replace tab.

3. Set Up the Regular Expression:

__In the Find what field, enter the following regex pattern:

________________________________

^.*?<(.*?)>.*$

________________________________

__In the Replace with field, enter:

________________________________

4. Configure the Search Mode:

__Make sure to select Regular expression at the bottom of the dialog.

5. Execute the Replacement:

__Click on Replace All.

Explanation of the Regex

__"^.*?": Matches any text at the beginning of the line up to the first "<".

__"<(.*?)>": Captures the text (email address) inside the angle brackets.

__".*$": Matches any text after the closing ">" until the end of the line.

__"\1": Refers to the first captured group, which is the email address.

Friday, August 23, 2024

retrieves a list of all listening TCP and UDP connections along with their associated processes, extracts the PIDs of these processes, and for each PID, it prints the PID and the command line arguments used to start the process.

---------------

for i in `sudo netstat -tulpn | awk '{print $7}' | cut -d/ -f1`; do echo "------ $i -------"; ps -p $i -o args=; done

---------------

----> Detailed Explanation

1. sudo netstat -tulpn:

- sudo: This command is run with superuser privileges, which is necessary to view all network connections and the processes associated with them.

- netstat: A command-line tool that displays network connections, routing tables, interface statistics, and more.

- -tulpn: These options specify what information to display:

- -t: Show TCP connections.

- -u: Show UDP connections.

- -l: Show only listening sockets.

- -p: Show the process ID (PID) and name of the program to which each socket belongs.

- -n: Show numerical addresses instead of resolving hostnames.

This command lists all active TCP and UDP connections along with the associated processes.

2. | awk '{print $7}':

- The output of `netstat` is piped (`|`) into `awk`, a text processing tool.

- '{print $7}': This command extracts the seventh column from the `netstat` output, which contains the PID and program name in the format `PID/ProgramName`.

3. | cut -d/ -f1:

- The output from `awk` is further piped into `cut`, which is used to split the string.

- -d/: Specifies the delimiter as `/`.

- -f1: Extracts the first field, which is the PID of the process (the part before the `/`).

4. for i in ...; do ...; done:

- This is a `for` loop that iterates over each PID extracted from the previous commands.

- $i: Represents the current PID in each iteration of the loop.

5. echo "------ $i -------":

- This command prints a separator line with the current PID, making the output more readable.

6. ps -p $i -o args=:

- ps: A command that reports a snapshot of current processes.

- -p $i: Specifies to show information for the process with the PID stored in `$i`.

- -o args=: Customizes the output to show only the command line arguments of the process, omitting the header.

----> Summary of Functionality

This command effectively does the following:

- Retrieves a list of all listening TCP and UDP connections along with their associated processes.

- Extracts the PIDs of these processes from the output of `netstat`.

- For each PID**, it prints the PID and the command line arguments used to start the process.

----> Example Output Interpretation

When you run this command, the output might look something like this:

---------------

------ 1234 -------

/usr/bin/python3 /path/to/script.py

------ 5678 -------

/usr/sbin/nginx -g daemon off;

---------------

In this example:

- The first line indicates that the process with PID `1234` is running a Python script.

- The second line shows that the process with PID `5678` is an instance of Nginx.

----> Conclusion

This command is useful for system administrators or users who want to monitor which processes are listening on network ports and what commands were used to start those processes. It combines several powerful command-line tools to provide a comprehensive view of network activity on the system

Thursday, August 22, 2024

How see complete path of a process running on specific port in linux

To find the complete path of a process that is using a specific port in Linux, you can follow these steps:

Step 1: Find the PID of the Process

1. Using `lsof` Command**:

- Open a terminal and run the following command (replace `PORT_NUMBER` with the actual port number):

--------------------------------------------------------------------

sudo lsof -i :PORT_NUMBER

--------------------------------------------------------------------

- This command will list all processes using the specified port, along with their PIDs.

2. Using `netstat` Command**:

- Alternatively, you can use the `netstat` command:

--------------------------------------------------------------------

sudo netstat -tulpn | grep :PORT_NUMBER

--------------------------------------------------------------------

- This will show you the PID and the program name associated with the port.

Step 2: Get the Complete Path of the Process

Once you have the PID from the previous step, you can find the complete path of the process:

1. Using `ps` Command**:

- Run the following command (replace `YOUR_PID` with the PID you obtained):

--------------------------------------------------------------------

ps -p YOUR_PID -o args=

--------------------------------------------------------------------

- This command will display the command line that started the process, including the complete path.

2. Using `/proc` Filesystem**:

- You can also check the `/proc` filesystem for more details:

--------------------------------------------------------------------

ls -l /proc/YOUR_PID/exe

--------------------------------------------------------------------

- This will show you a symbolic link to the executable of the process, revealing its complete path.

Monday, August 5, 2024

Components of #Kafka

Apache Kafka is a powerful distributed messaging system designed for handling real-time data streams. Its architecture consists of several key components that work together to facilitate data production, storage, and consumption. Here’s a breakdown of the main components:

Kafka Broker:
- A broker is a server that stores messages and serves client requests. Kafka can have multiple brokers in a cluster, allowing it to handle large volumes of data and provide fault tolerance. Each broker manages a portion of the data and can communicate with other brokers to ensure data availability and reliability.
Kafka Producer:
- The producer is an application that sends data (messages) to Kafka topics. Producers can publish messages to one or more topics, and they are responsible for deciding which topic to send the data to. They can also choose to send messages to specific partitions within a topic for better load balancing.
Kafka Consumer:
- The consumer is an application that reads messages from Kafka topics. Consumers subscribe to one or more topics and process the incoming data. They can operate individually or as part of a consumer group, where multiple consumers work together to read data from the same topic, allowing for parallel processing.
Kafka Topic:
- A topic is a category or feed name to which messages are published. Each topic can have multiple partitions, which are segments of the topic that allow for parallel processing and scalability. Messages within a partition are ordered, and each message has a unique identifier called an offset.
Partition:
- Each topic is divided into partitions to enable scalability and parallelism. Partitions allow Kafka to distribute data across multiple brokers, improving performance and fault tolerance. Each partition is an ordered log of messages.
Zookeeper:
- Zookeeper is an external service used by Kafka to manage and coordinate the brokers in a cluster. It helps with leader election for partitions, configuration management, and maintaining metadata about the Kafka cluster. While Kafka can function without Zookeeper in some configurations, it is traditionally used for managing cluster state.
Kafka Connect:
- Kafka Connect is a tool for integrating Kafka with other systems. It allows for the easy import and export of data between Kafka and external data sources or sinks, such as databases, key-value stores, and file systems.
Kafka Streams:
- Kafka Streams is a client library for building applications and microservices that process and analyze data stored in Kafka. It allows developers to create real-time processing applications that can transform and aggregate data as it flows through Kafka.

Thursday, August 1, 2024

What is #CDC #Ingestion Load?

Change Data Capture (CDC) ingestion is a method used to track and capture changes made to data in a database. Once these changes are identified, they are sent to another system, like a data warehouse or data lake. This process helps keep data up-to-date across different platforms, making it easier to analyze information in real time.

Key Features of CDC Ingestion

Real-Time Data Movement: CDC allows data to be updated almost instantly. This is important for applications that need the latest information without waiting for scheduled updates.
Efficient Data Synchronization: CDC continuously checks for changes in the source database and quickly updates the target systems. This means less strain on the original database and less traffic on the network.
Works with Different Databases: CDC can be used with various types of databases, including popular ones like SQL Server and Oracle, as well as other systems that handle transactions.
Log-Based Tracking: Many CDC systems use transaction logs to monitor changes. This method captures not just the current data but also its history, giving a complete picture of how data has changed over time.
Integration with ETL Tools: CDC often works alongside ETL (Extract, Transform, Load) processes, which help move data from one place to another without causing too much disruption.

Benefits of CDC Ingestion

Better Analytics: By keeping data current, CDC helps organizations make informed decisions based on the latest information.
Less Delay: Unlike older methods that might take time to sync data, CDC provides updates almost immediately, making data more useful.
Scalability: CDC can grow with the organization, handling more data as needed without losing performance.

In summary, CDC ingestion is an effective way to manage changes in data, allowing organizations to use their data for timely insights and better decision-making.

Infinite Programming Tips

Thursday, August 29, 2024

Replace all text except email address in notepad++

Friday, August 23, 2024

retrieves a list of all listening TCP and UDP connections along with their associated processes, extracts the PIDs of these processes, and for each PID, it prints the PID and the command line arguments used to start the process.

Thursday, August 22, 2024

How see complete path of a process running on specific port in linux

Monday, August 5, 2024

Components of #Kafka

Thursday, August 1, 2024

What is #CDC #Ingestion Load?

Key Features of CDC Ingestion

Benefits of CDC Ingestion

Featured Posts

Run Commands for Windows