检查大文件是否包含另一个(较小的)文件

检查大文件是否包含另一个(较小的)文件

我有一个 3MB 和一个 5MB 的文本文件。我想确保较大的文件包含较小文件中的所有行。

输出需要显示较小文件的所有未包含在较大文件中的行。我尝试将它们与 Notepad++ 进行比较,但它挂起了。Word 2007 比较很难理解。

我尝试了 Beyond Compare、WinMerge、fc 等。两个文件中的行顺序不同 - 因此比较工具说行不同,但大文件中的同一行位于不同位置。想想小文件是这样的 -

abc def
ghi jkl
mno pqr
yza bcd

想想大文件是这样的 -

efg hij
mno pqr
ghi jkl
abc def
stu vwx

我想要输出这个-

yza bcd

答案1

使用比较工具无可比拟KDiff3, 或者强制应该足够了。

更新:

今天早上我心情大好,所以我把这个给你。应该可以满足你的要求。

在此处输入图片描述

一些说明:

1.) 此代码将处理重复项。例如,如果包含相同文本的行在小文件中出现两次,则预计该行也会在大文件中出现两次。

2.)此代码根据您的使用情况忽略行排序。

3.) 关于文件末尾空白行的小错误,我不想弄乱它。此代码将空白行视为与任何其他行一样的行,只要它不在文件末尾即可,在这种情况下允许一个空白行(并忽略)。例如,如果小文件在文件末尾有 3 个空白行,没有其他空白行,那么大文件预计在其他行中间至少有 2 个空白行,或者在文件末尾有 3 个空白行。

跑步:

1.)确保你有一个JDK已安装

2.) 确保 java 在您的路径中。如果您使用的是 Windows 系统,请转到控制面板 > 系统 > 高级系统设置 > 环境变量,然后选择Path系统变量部分。将 JDK bin 文件夹的位置附加到路径变量中,确保用分号将其与上一个条目分开。如下所示:

C:\Program Files (x86)\Java\jdk1.6.0_38\bin;

3.)将以下代码复制到名为FileLineComparator.java

4.) 打开命令提示符并导航到刚刚创建的文件的目录

5.)类型javac FileLineComparator.java

6.)类型java -cp . FileLineComparator

7.)享受吧!

import java.io.*;
import java.util.ArrayList;
import javax.swing.JFileChooser;
import javax.swing.JOptionPane;

public class FileLineComparator extends javax.swing.JFrame {

    public FileLineComparator() {
        initComponents();
    }

    @SuppressWarnings( "unchecked" )
    // <editor-fold defaultstate="collapsed" desc="Generated Code">
    private void initComponents() {

        fileChooser = new javax.swing.JFileChooser();
        smallFileTextField = new javax.swing.JTextField();
        smallFileLabel = new javax.swing.JLabel();
        largeFileLabel = new javax.swing.JLabel();
        largeFileTextField = new javax.swing.JTextField();
        outputFileLabel = new javax.swing.JLabel();
        outputFileTextField = new javax.swing.JTextField();
        goButton = new javax.swing.JButton();
        smallFileButton = new javax.swing.JButton();
        largeFileButton = new javax.swing.JButton();
        outputFileButton = new javax.swing.JButton();

        setDefaultCloseOperation(javax.swing.WindowConstants.EXIT_ON_CLOSE);

        smallFileLabel.setText("Small text file:");

        largeFileLabel.setText("Large text file:");

        outputFileLabel.setText("Output file:");

        goButton.setText("Go!");
        goButton.addMouseListener(new java.awt.event.MouseAdapter() {
            public void mouseClicked(java.awt.event.MouseEvent evt) {
                goButtonMouseClicked(evt);
            }
        });

        smallFileButton.setText("Browse");
        smallFileButton.addMouseListener(new java.awt.event.MouseAdapter() {
            public void mouseClicked(java.awt.event.MouseEvent evt) {
                smallFileButtonMouseClicked(evt);
            }
        });

        largeFileButton.setText("Browse");
        largeFileButton.addMouseListener(new java.awt.event.MouseAdapter() {
            public void mouseClicked(java.awt.event.MouseEvent evt) {
                largeFileButtonMouseClicked(evt);
            }
        });

        outputFileButton.setText("Browse");
        outputFileButton.addMouseListener(new java.awt.event.MouseAdapter() {
            public void mouseClicked(java.awt.event.MouseEvent evt) {
                outputFileButtonMouseClicked(evt);
            }
        });

        javax.swing.GroupLayout layout = new javax.swing.GroupLayout(getContentPane());
        getContentPane().setLayout(layout);
        layout.setHorizontalGroup(
            layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
            .addGroup(layout.createSequentialGroup()
                .addContainerGap()
                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING, false)
                    .addGroup(layout.createSequentialGroup()
                        .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
                            .addGroup(layout.createSequentialGroup()
                                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
                                    .addComponent(largeFileLabel)
                                    .addComponent(smallFileLabel))
                                .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.TRAILING, false)
                                    .addComponent(outputFileTextField, javax.swing.GroupLayout.Alignment.LEADING, javax.swing.GroupLayout.DEFAULT_SIZE, 194, Short.MAX_VALUE)
                                    .addComponent(largeFileTextField, javax.swing.GroupLayout.Alignment.LEADING)
                                    .addComponent(smallFileTextField)))
                            .addComponent(outputFileLabel))
                        .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                        .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
                            .addComponent(largeFileButton)
                            .addComponent(smallFileButton)
                            .addComponent(outputFileButton)))
                    .addComponent(goButton, javax.swing.GroupLayout.DEFAULT_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE))
                .addContainerGap(16, Short.MAX_VALUE))
        );
        layout.setVerticalGroup(
            layout.createParallelGroup(javax.swing.GroupLayout.Alignment.LEADING)
            .addGroup(layout.createSequentialGroup()
                .addContainerGap()
                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASELINE)
                    .addComponent(smallFileTextField, javax.swing.GroupLayout.PREFERRED_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE, javax.swing.GroupLayout.PREFERRED_SIZE)
                    .addComponent(smallFileLabel)
                    .addComponent(smallFileButton))
                .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASELINE)
                    .addComponent(largeFileLabel)
                    .addComponent(largeFileTextField, javax.swing.GroupLayout.PREFERRED_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE, javax.swing.GroupLayout.PREFERRED_SIZE)
                    .addComponent(largeFileButton))
                .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                .addGroup(layout.createParallelGroup(javax.swing.GroupLayout.Alignment.BASELINE)
                    .addComponent(outputFileTextField, javax.swing.GroupLayout.PREFERRED_SIZE, javax.swing.GroupLayout.DEFAULT_SIZE, javax.swing.GroupLayout.PREFERRED_SIZE)
                    .addComponent(outputFileLabel)
                    .addComponent(outputFileButton))
                .addPreferredGap(javax.swing.LayoutStyle.ComponentPlacement.RELATED)
                .addComponent(goButton, javax.swing.GroupLayout.PREFERRED_SIZE, 62, javax.swing.GroupLayout.PREFERRED_SIZE)
                .addContainerGap(javax.swing.GroupLayout.DEFAULT_SIZE, Short.MAX_VALUE))
        );

        pack();
    }// </editor-fold>

    private void smallFileButtonMouseClicked( java.awt.event.MouseEvent evt ) {
        setSelectedFile( FILE_TYPES.SMALL );
    }

    private void largeFileButtonMouseClicked( java.awt.event.MouseEvent evt ) {
        setSelectedFile( FILE_TYPES.LARGE );
    }

    private void outputFileButtonMouseClicked( java.awt.event.MouseEvent evt ) {
        setSelectedFile( FILE_TYPES.OUTPUT );
    }

    private void goButtonMouseClicked( java.awt.event.MouseEvent evt ) {
        errorStub = new StringBuilder();
        smallFile = new File( smallFileTextField.getText() );
        smallFileTextField.setText( smallFile.getAbsolutePath() );
        largeFile = new File( largeFileTextField.getText() );
        largeFileTextField.setText( largeFile.getAbsolutePath() );
        outputFile = new File( outputFileTextField.getText() );
        outputFileTextField.setText( outputFile.getAbsolutePath() );
        process();
    }

    private void setSelectedFile( FILE_TYPES fileType ) {
        int returnVal = fileChooser.showOpenDialog( null );
        if( returnVal == JFileChooser.APPROVE_OPTION ) {
            File file = fileChooser.getSelectedFile();
            switch( fileType ) {
                case SMALL:
                    smallFileTextField.setText( file.getPath() );
                    break;
                case LARGE:
                    largeFileTextField.setText( file.getPath() );
                    break;
                case OUTPUT:
                    outputFileTextField.setText( file.getPath() );
                    break;
            }
        }
    }

    private void process() {
        ArrayList<String> smallFileLines = readFileLines( smallFile );
        ArrayList<String> largeFileLines = readFileLines( largeFile );
        ArrayList<String> outputFileLines = new ArrayList<String>();

        for( String line : smallFileLines ) {
            if( !largeFileLines.contains( line ) ) {
                outputFileLines.add( line );
            } else {
                largeFileLines.remove( line );
            }
        }

        if( errorStub.length() == 0 ) {
            writeOutput( outputFileLines );
        }

        if( errorStub.length() == 0 ) {
            JOptionPane.showMessageDialog( null, "Finished Successfully!" );
        } else {
            JOptionPane.showMessageDialog( null, errorStub.toString() );
        }
    }

    private ArrayList<String> readFileLines( File file ) {
        ArrayList<String> al = new ArrayList<String>();
        try {
            FileReader fr = new FileReader( file );
            BufferedReader bufRdr = new BufferedReader( fr );
            String line = null;
            while( ( line = bufRdr.readLine() ) != null ) {
                al.add( line );
            }
            bufRdr.close();
        } catch( IOException ioex ) {
            errorStub.append( String.format( "Error reading file %s\r\n", file.getAbsolutePath() ) );
            System.err.println( ioex.getMessage() );
        }
        return al;
    }

    private void writeOutput( ArrayList<String> outputFileLines ) {
        try {
            FileWriter fw = new FileWriter( outputFile );
            BufferedWriter bw = new BufferedWriter( fw );
            for( int i = 0; i < outputFileLines.size(); i++ ) {
                String line = String.format( "%s%s", outputFileLines.get( i ), i + 1 == outputFileLines.size() ? "" : "\r\n" );
                bw.write( line );
            }
            bw.close();
        } catch( Exception ex ) {
            errorStub.append( String.format( "Error writing file %s\r\n", outputFile.getAbsolutePath() ) );
            System.err.println( ex.getMessage() );
        }
    }

    public static void main( String args[] ) {
        try {
            for( javax.swing.UIManager.LookAndFeelInfo info : javax.swing.UIManager.getInstalledLookAndFeels() ) {
                if( "Nimbus".equals( info.getName() ) ) {
                    javax.swing.UIManager.setLookAndFeel( info.getClassName() );
                    break;
                }
            }
        } catch( ClassNotFoundException ex ) {
            java.util.logging.Logger.getLogger( FileLineComparator.class.getName() ).log( java.util.logging.Level.SEVERE, null, ex );
        } catch( InstantiationException ex ) {
            java.util.logging.Logger.getLogger( FileLineComparator.class.getName() ).log( java.util.logging.Level.SEVERE, null, ex );
        } catch( IllegalAccessException ex ) {
            java.util.logging.Logger.getLogger( FileLineComparator.class.getName() ).log( java.util.logging.Level.SEVERE, null, ex );
        } catch( javax.swing.UnsupportedLookAndFeelException ex ) {
            java.util.logging.Logger.getLogger( FileLineComparator.class.getName() ).log( java.util.logging.Level.SEVERE, null, ex );
        }

        java.awt.EventQueue.invokeLater( new Runnable() {

            public void run() {
                new FileLineComparator().setVisible( true );
            }
        } );
    }

    private enum FILE_TYPES {
        SMALL,
        LARGE,
        OUTPUT
    }

    private File smallFile = null;
    private File largeFile = null;
    private File outputFile = null;

    private StringBuilder errorStub = null;

    // Variables declaration - do not modify
    private javax.swing.JFileChooser fileChooser;
    private javax.swing.JButton goButton;
    private javax.swing.JButton largeFileButton;
    private javax.swing.JLabel largeFileLabel;
    private javax.swing.JTextField largeFileTextField;
    private javax.swing.JButton outputFileButton;
    private javax.swing.JLabel outputFileLabel;
    private javax.swing.JTextField outputFileTextField;
    private javax.swing.JButton smallFileButton;
    private javax.swing.JLabel smallFileLabel;
    private javax.swing.JTextField smallFileTextField;
    // End of variables declaration
}

答案2

合并非常适合比较文本文件。

相关内容